Tag

#MoE

9 articles

Alibaba Previews Qwen3.8-Max, a 2.4 Trillion-Parameter Multimodal Model, Days After Moonshot’s Kimi K3 Open-Weight Launch

This article explains the advanced concepts behind Alibaba's Qwen3.8-Max, a 2.4 trillion-parameter multimodal model, including multimodal capabilities, mixture-of-experts architecture, and parameter scaling effects.

Jul 1920

Kimi K3 vs DeepSeek V4 Pro vs GLM-5.2: Open Trillion-Scale MoE Models Compared on Benchmarks, License, and Serving Cost

This article explains Mixture of Experts (MoE) AI models, how they work like teams of specialists, and why they're important for efficient AI performance.

Jul 1813

Soofi Consortium Releases Soofi S 30B-A3B: An Open Hybrid Mamba-Transformer MoE Foundation Model For German And English

Soofi Consortium releases Soofi S 30B-A3B, an open hybrid Mamba-Transformer MoE model for German and English. The model leverages 3.2 billion active parameters out of 31.6 billion for efficient multilingual processing.

Jul 1526

tech

Meet Nemotron Labs 3 Puzzle 75B A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput

NVIDIA introduces Nemotron-Labs-3-Puzzle-75B-A9B, a compressed hybrid MoE LLM delivering 2.03x server throughput, leveraging hardware-aware compression and knowledge distillation.

Jul 934

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

JetBrains has released Mellum2, a 12-billion parameter MoE model trained on 10.6 trillion tokens, designed to accelerate specialized AI tasks in multi-model pipelines.

Jun 146

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters

This explainer explores the advanced Mixture of Experts (MoE) architecture used in Liquid AI's LFM2.5-8B-A1B model, examining how sparse parameter activation enables powerful on-device AI capabilities.

May 2843

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence

This article explains the advanced AI concepts behind Qwen 3.6-35B-A3B, a multimodal model that combines MoE routing, RAG, and session persistence for intelligent, context-aware AI applications.

Apr 2076

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

Alibaba's Qwen team open-sources Qwen3.6-35B-A3B, a sparse MoE vision-language model with 3B active parameters and agentic coding capabilities.

Apr 1690

NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities

This explainer article dives into NVIDIA's Nemotron-Cascade 2, an advanced Mixture-of-Experts (MoE) model that demonstrates how strategic parameter allocation can enhance reasoning capabilities while maintaining computational efficiency.

Mar 20114