Tokens per second (tok/s)

The standard speed metric for LLM inference, measuring how many text tokens the model generates each second. A comfortable conversational speed is around 30–40 tok/s; below 10 tok/s feels sluggish. Your tok/s depends on VRAM bandwidth, model size, quantization level, and GPU architecture. When comparing hardware for local AI, tok/s benchmarks on the models you care about matter more than raw TFLOPS.

Run LLMs locally
Best GPU for AI

More Terms

TGI (Text Generation Inference)Tokens TOPS / TFLOPS Training Transformer

Back to Glossary

Tokens per second (tok/s)

Related Articles

More Terms