LoRA Fine-Tuning Guide

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that adapts a pre-trained LLM to your task by training a small set of additional weights — without retraining the full model. BeaverYard handles all the infrastructure. You provide a dataset; we deliver PEFT-compatible adapter weights.

What is LoRA fine-tuning?

LoRA works by injecting low-rank matrix pairs into the attention layers of a transformer model. During training, only these small matrices are updated — the original model weights stay frozen. The result is an adapter file (~tens of MB) that, when loaded alongside the base model, reproduces the fine-tuned behavior.

No full model retraining — adapter weights are the only output
Works with LLaMA, Mistral, and Gemma models
PEFT-compatible — loads with Hugging Face PEFT and standard inference frameworks
No platform lock-in — you download the weights and run them anywhere

Prepare your dataset

BeaverYard expects a .jsonl file with one JSON object per line. Each record must follow a supported chat or instruction format.

{"messages": [
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "What is LoRA?"},
  {"role": "assistant", "content": "LoRA is a parameter-efficient fine-tuning method..."}
]}

Validation runs automatically before payment. You see the tier, price, and any format errors before committing. See Dataset Format for full validation rules.

Submit a run via API

The BeaverYard API accepts a single multipart request — your dataset file and run options in one call. No separate upload step required.

# 1. Submit the training run (dataset + options in one request)
curl -X POST https://api.beaveryard.com/api/v1/runs \
  -H "Authorization: Bearer $BEAVERYARD_API_KEY" \
  -H "Idempotency-Key: $(uuidgen)" \
  -F "dataset=@train.jsonl" \
  -F "model_id=llama-3.1-8b" \
  -F "model_terms_accepted=true" \
  -F "data_policy_accepted=true"

# 2. Poll for status (use the run_id returned above)
curl https://api.beaveryard.com/api/v1/runs/$RUN_ID \
  -H "Authorization: Bearer $BEAVERYARD_API_KEY"

# 3. Get artifact download URLs when status is "completed"
curl -X POST https://api.beaveryard.com/api/v1/runs/$RUN_ID/artifacts/download \
  -H "Authorization: Bearer $BEAVERYARD_API_KEY"

The Idempotency-Key header prevents duplicate submissions if you retry. See the full API Reference for all fields, response schemas, and error codes.

Use your adapter weights

After training completes, you download two files: adapter_model.safetensors and adapter_config.json. The config file contains the exact base model path — load both with Hugging Face PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# For LLaMA runs — base model path is in adapter_config.json
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

model = PeftModel.from_pretrained(base_model, "./adapter-weights/")
model.eval()

See How to Use Adapters for Mistral/Gemma examples and deployment options. See Artifacts & Downloads for download windows and link expiry.

Supported models

LLaMA — Meta LLaMA family
Mistral — Mistral 7B and variants
Gemma — Google Gemma family

All models are the same price. See Model Selection for details.

Next steps

API Reference Dataset Format Pricing Artifacts & Downloads How to Use Adapters