LoRA Fine-Tuning Guide
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that adapts a pre-trained LLM to your task by training a small set of additional weights — without retraining the full model. BeaverYard handles all the infrastructure. You provide a dataset; we deliver PEFT-compatible adapter weights.
What is LoRA fine-tuning?
LoRA works by injecting low-rank matrix pairs into the attention layers of a transformer model. During training, only these small matrices are updated — the original model weights stay frozen. The result is an adapter file (~tens of MB) that, when loaded alongside the base model, reproduces the fine-tuned behavior.
- No full model retraining — adapter weights are the only output
- Works with LLaMA, Mistral, and Gemma models
- PEFT-compatible — loads with Hugging Face PEFT and standard inference frameworks
- No platform lock-in — you download the weights and run them anywhere
Prepare your dataset
BeaverYard expects a .jsonl file with one JSON object per line. Each record must follow a supported chat or instruction format.
{"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is LoRA?"},
{"role": "assistant", "content": "LoRA is a parameter-efficient fine-tuning method..."}
]}Validation runs automatically before payment. You see the tier, price, and any format errors before committing. See Dataset Format for full validation rules.
Submit a run via API
The BeaverYard API accepts a single multipart request — your dataset file and run options in one call. No separate upload step required.
# 1. Submit the training run (dataset + options in one request) curl -X POST https://api.beaveryard.com/api/v1/runs \ -H "Authorization: Bearer $BEAVERYARD_API_KEY" \ -H "Idempotency-Key: $(uuidgen)" \ -F "dataset=@train.jsonl" \ -F "model_id=llama-3.1-8b" \ -F "model_terms_accepted=true" \ -F "data_policy_accepted=true" # 2. Poll for status (use the run_id returned above) curl https://api.beaveryard.com/api/v1/runs/$RUN_ID \ -H "Authorization: Bearer $BEAVERYARD_API_KEY" # 3. Get artifact download URLs when status is "completed" curl -X POST https://api.beaveryard.com/api/v1/runs/$RUN_ID/artifacts/download \ -H "Authorization: Bearer $BEAVERYARD_API_KEY"
The Idempotency-Key header prevents duplicate submissions if you retry. See the full API Reference for all fields, response schemas, and error codes.
Use your adapter weights
After training completes, you download two files: adapter_model.safetensors and adapter_config.json. The config file contains the exact base model path — load both with Hugging Face PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# For LLaMA runs — base model path is in adapter_config.json
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "./adapter-weights/")
model.eval()See How to Use Adapters for Mistral/Gemma examples and deployment options. See Artifacts & Downloads for download windows and link expiry.
Supported models
- LLaMA — Meta LLaMA family
- Mistral — Mistral 7B and variants
- Gemma — Google Gemma family
All models are the same price. See Model Selection for details.