Tier limits & eligibility

This page covers tier caps, retention windows, and what can cause eligibility to fail before payment. Use it to confirm your run is ready and avoid retries. For dollar pricing, use the Pricing page.

Tier caps and retention

Eligibility is checked before payment against file format, record structure, dataset size, token cap, record count, and max line length. Record count above 200,000 and line length above 20,000 characters are global hard stops before any tier is assigned.

Tier	Dataset Limit	Token cap	Records/lines	Max line length	Artifact Retention
Launch S	Up to 50 MB	5.5M	200,000	20,000 chars	7 days
Launch M	Over 50 MB to 150 MB	7.5M	200,000	20,000 chars	7 days
Launch L	Over 150 MB to 300 MB	10.0M	200,000	20,000 chars	7 days
Orbit S	Over 300 MB to 400 MB	16.5M	200,000	20,000 chars	7 days
Orbit M	Over 400 MB to 500 MB	22.5M	200,000	20,000 chars	7 days
Orbit L	Over 500 MB to 600 MB	28.0M	200,000	20,000 chars	7 days

Launch S

Dataset: Up to 50 MB
Tokens: 5.5M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Launch M

Dataset: Over 50 MB to 150 MB
Tokens: 7.5M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Launch L

Dataset: Over 150 MB to 300 MB
Tokens: 10.0M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Orbit S

Dataset: Over 300 MB to 400 MB
Tokens: 16.5M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Orbit M

Dataset: Over 400 MB to 500 MB
Tokens: 22.5M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Orbit L

Dataset: Over 500 MB to 600 MB
Tokens: 28.0M
Records: 200,000
Line chars: 20,000
Retention: 7 days

Displayed dataset ranges are planning bands. Final tier assignment happens automatically after upload and validation, and the higher required tier wins when dataset size and token estimate point to different tiers.

Common eligibility failures

File is not a valid .jsonl dataset or is not UTF-8 encoded.
One or more lines are not valid JSON objects or do not use a supported record structure.
Dataset size exceeds BeaverYard's maximum published size cap.
Token estimate exceeds BeaverYard's maximum published token cap.
Record count exceeds the global 200,000 records/lines cap.
One or more lines exceed the global 20,000-character line-length cap.

Priority Processing

Priority Processing is available as an optional scheduling add-on at checkout.
Priority Processing is included automatically for BeaverYard Plus and Pro members at no extra cost.
It does not bypass dataset format or published tier size limits.

What we do not check

We do not score dataset quality or predict model performance.
We do not rewrite, deduplicate, or otherwise improve your dataset automatically.
Training quality remains your responsibility and depends on the data you upload.

Helpful links

How it works Pricing Dataset Format