Definition
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that freezes the pretrained model weights and injects trainable low-rank decomposition matrices into each transformer layer. Instead of updating a full weight matrix W, LoRA learns two small matrices A and B such that the update is delta_W = BA, where B and A have much lower rank than W.
Key Intuition
Weight updates during fine-tuning have low intrinsic dimensionality. Rather than modifying all parameters, LoRA constrains the update to a low-rank subspace, dramatically reducing the number of trainable parameters while preserving the capacity to adapt. The original weights remain frozen, so multiple LoRA adapters can be swapped in and out at inference time without modifying the base model.
History/Origin
Hu et al. (2021) introduced LoRA (see lora), motivated by the observation from Aghajanyan et al. (2020) that pretrained language models have a low intrinsic dimension. LoRA was applied to gpt-3 175B, reducing trainable parameters by 10,000x (from 175B to roughly 17M) with comparable fine-tuning quality. The method became foundational for the open-source LLM ecosystem, where full fine-tuning of large models is prohibitively expensive.
Relationship to Other Concepts
LoRA is the most widely adopted form of parameter-efficient fine-tuning, complementing full fine-tuning and other methods like adapters and prefix tuning. It builds on pretraining and transfer-learning. QLoRA (Dettmers et al., 2023) combined LoRA with 4-bit quantization, enabling fine-tuning of 65B models on a single GPU. LoRA adapters are commonly used in the SFT stage before rlhf or direct-preference-optimization.
Notable Results
LoRA on GPT-3 175B matched full fine-tuning performance on RTE, MRPC, and other benchmarks while training 10,000x fewer parameters. QLoRA fine-tuned a 65B model on a single 48GB GPU, democratizing large model adaptation. LoRA adapters became the standard mechanism for community-driven fine-tuning on Hugging Face.
Open Questions
- Optimal rank selection across layers and tasks.
- Whether LoRA systematically underperforms full fine-tuning on complex or heterogeneous tasks.
- How to compose or merge multiple LoRA adapters trained for different capabilities.