AWQ (Activation-aware Weight Quantization)

A quantization method that identifies the most important weights in a model by analyzing activation patterns, then preserves their precision while aggressively quantizing the rest. AWQ typically delivers better accuracy than naive round-to-nearest quantization at INT4, making it a popular choice for deploying large models on consumer GPUs. If you see an AWQ model variant on Hugging Face, it’s been optimized to run well on hardware with limited VRAM.

Related Products

Related Articles

More Terms