GPTQ
A post-training quantization method that compresses LLMs to 4-bit or 3-bit precision with minimal accuracy loss by using second-order optimization (approximate Hessian information). GPTQ models are optimized for GPU inference and typically run faster than GGUF on NVIDIA cards with enough VRAM. If you have a dedicated NVIDIA GPU and want the fastest quantized inference, GPTQ is often the best format.