AWQ (Activation-aware Weight Quantization)

A quantization method that identifies the most important weights in a model by analyzing activation patterns, then preserves their precision while aggressively quantizing the rest. AWQ typically delivers better accuracy than naive round-to-nearest quantization at INT4, making it a popular choice for deploying large models on consumer GPUs. If you see an AWQ model variant on Hugging Face, it’s been optimized to run well on hardware with limited VRAM.

More Terms

Batch Size BF16 (Brain Floating Point)Context Window CUDA CUDA Cores

Back to Glossary

AWQ (Activation-aware Weight Quantization)

Related Products

Related Articles

More Terms