LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that updates only small low-rank matrices injected into model layers.
Full Definition
LoRA freezes the original model weights and injects trainable low-rank decomposition matrices (A and B, where rank r << d) into each transformer layer. During fine-tuning only these adapter matrices are updated, reducing trainable parameters by 10,000x compared to full fine-tuning while achieving comparable performance. At inference time, the LoRA weights are merged back into the base model, adding zero latency overhead. LoRA enables fine-tuning large models (7B–70B) on consumer GPUs by dramatically reducing VRAM requirements. It is the dominant fine-tuning method in the open-source community and is supported natively by Hugging Face's PEFT library.
Examples
Fine-tuning Llama 3 70B with r=16 LoRA on a single A100 GPU in 4 hours, using only 0.02% of the total parameter count.
Maintaining multiple LoRA adapters for different personas (formal, casual, technical) that can be swapped onto the same base model at runtime.
Apply this in your prompts
PromptITIN automatically uses techniques like LoRA (Low-Rank Adaptation) to build better prompts for you.
Related Terms
Fine-Tuning
Continuing training of a pretrained model on a smaller, task-specific dataset to…
View →QLoRA
A memory-efficient fine-tuning method combining quantisation with LoRA adapters.…
View →Parameter-Efficient Fine-Tuning
A family of methods that fine-tune large models by updating only a small fractio…
View →