GGUF

The file format used by llama.cpp and compatible tools (Ollama, LM Studio) for storing quantized LLMs. GGUF replaced the older GGML format and includes metadata about the model’s architecture, quantization level, and tokenizer. When downloading a model to run locally, GGUF files are typically what you want. They come in variants like Q4_K_M or Q5_K_S, where lower numbers mean more compression (less VRAM, slightly lower quality).

Related Products

Related Articles

More Terms