Knowledge Distillation

A model compression technique where a large, high-performance “teacher” model transfers its learned knowledge to a smaller “student” model through soft label training. The student learns not just the correct answers but the teacher’s confidence distribution across all possible outputs. This is how many of the best small models (1B–7B parameters) achieve impressive performance while remaining runnable on budget hardware with 8–16 GB VRAM.

More Terms

Inference INT4 / INT8 (Quantization)KV Cache llama.cpp LLM (Large Language Model)

Back to Glossary

Knowledge Distillation

Related Products

More Terms