MoE (Mixture of Experts)
An architecture where a model contains multiple specialized sub-networks (“experts”) and a router that activates only a subset for each input token. This means a 140B-parameter MoE model might only use 40B parameters per token, running much faster than a dense 140B model while needing the full parameter count in memory. For hardware buyers, MoE models are deceptive: they’re fast for their apparent size but still need substantial VRAM to load all experts.