KV Cache

A memory buffer that stores previously computed key-value attention pairs during LLM text generation, so the model doesn’t have to recompute them for every new token. The KV cache grows with context length and batch size, and can consume significant VRAM on top of the model weights. For a 70B model at 128K context, the KV cache alone can require 20+ GB. This is why long conversations use more VRAM than short ones on the same model.

More Terms

INT4 / INT8 (Quantization)Knowledge Distillation llama.cpp LLM (Large Language Model)LoRA

Back to Glossary

KV Cache

Related Products

Related Articles

More Terms