GGUF

The file format used by llama.cpp and compatible tools (Ollama, LM Studio) for storing quantized LLMs. GGUF replaced the older GGML format and includes metadata about the model’s architecture, quantization level, and tokenizer. When downloading a model to run locally, GGUF files are typically what you want. They come in variants like Q4_K_M or Q5_K_S, where lower numbers mean more compression (less VRAM, slightly lower quality).

More Terms

FP4 GDDR6X / GDDR7 / HBM GPTQ Gradient Checkpointing Inference

Back to Glossary

GGUF

Related Products

Related Articles

More Terms