TGI (Text Generation Inference)

Hugging Face’s production-ready inference server optimized for serving LLMs with features like continuous batching, tensor parallelism, and quantization support. TGI is designed for deploying models at scale rather than personal use, but it’s relevant for anyone running a local AI API server that needs to handle multiple concurrent requests. Requires an NVIDIA GPU with substantial VRAM for best performance.

More Terms

TDP (Thermal Design Power)Tensor Cores Tokens Tokens per second (tok/s)TOPS / TFLOPS

Back to Glossary

TGI (Text Generation Inference)

Related Products

More Terms