LoRA

Low-Rank Adaptation — a technique for fine-tuning large models by training only a small set of additional parameters instead of the full model. LoRA dramatically reduces the VRAM required for fine-tuning, making it possible to customize 7B–13B models on a single consumer GPU with 24 GB VRAM. It’s the most practical fine-tuning method for anyone without data-center hardware.