Distillation

Training a smaller “student” model to replicate the behavior of a larger “teacher” model. Distillation is how many efficient open-source models are created — a 70B model’s knowledge gets compressed into a 7B model that runs on consumer hardware. The result is a model that punches above its parameter count, making distilled models some of the best choices for local AI on limited VRAM.