KV Cache
A memory buffer that stores previously computed key-value attention pairs during LLM text generation, so the model doesn’t have to recompute them for every new token. The KV cache grows with context length and batch size, and can consume significant VRAM on top of the model weights. For a 70B model at 128K context, the KV cache alone can require 20+ GB. This is why long conversations use more VRAM than short ones on the same model.