Tokens per second (tok/s)
The standard speed metric for LLM inference, measuring how many text tokens the model generates each second. A comfortable conversational speed is around 30–40 tok/s; below 10 tok/s feels sluggish. Your tok/s depends on VRAM bandwidth, model size, quantization level, and GPU architecture. When comparing hardware for local AI, tok/s benchmarks on the models you care about matter more than raw TFLOPS.