MoE (Mixture of Experts)

An architecture where a model contains multiple specialized sub-networks (“experts”) and a router that activates only a subset for each input token. This means a 140B-parameter MoE model might only use 40B parameters per token, running much faster than a dense 140B model while needing the full parameter count in memory. For hardware buyers, MoE models are deceptive: they’re fast for their apparent size but still need substantial VRAM to load all experts.

More Terms

MLX Model Sharding NPU (Neural Processing Unit)NVLink Ollama

Back to Glossary

MoE (Mixture of Experts)

Related Products

Related Articles

More Terms