QLoRA

Quantized LoRA — a technique that combines 4-bit quantization of the base model with LoRA fine-tuning on top. QLoRA makes it possible to fine-tune a 70B-parameter model on a single 24 GB GPU by keeping the frozen base weights in INT4 while training the LoRA adapters in FP16. It’s the most VRAM-efficient fine-tuning method available and has made consumer-GPU fine-tuning of large models genuinely practical.

Related Products

Related Articles

More Terms