Tier limits & eligibility

This page covers tier caps, retention windows, and what can cause eligibility to fail before payment. Use it to confirm your run is ready and avoid retries. For dollar pricing, use the Pricing page.

Tier caps and retention

Eligibility is checked before payment against file format, record structure, dataset size, token cap, record count, and max line length. Record count above 200,000 and line length above 20,000 characters are global hard stops before any tier is assigned.

Launch S

Dataset
Up to 50 MB
Tokens
5.5M
Records
200,000
Line chars
20,000
Retention
7 days

Launch M

Dataset
Over 50 MB to 150 MB
Tokens
7.5M
Records
200,000
Line chars
20,000
Retention
7 days

Launch L

Dataset
Over 150 MB to 300 MB
Tokens
10.0M
Records
200,000
Line chars
20,000
Retention
7 days

Orbit S

Dataset
Over 300 MB to 400 MB
Tokens
16.5M
Records
200,000
Line chars
20,000
Retention
7 days

Orbit M

Dataset
Over 400 MB to 500 MB
Tokens
22.5M
Records
200,000
Line chars
20,000
Retention
7 days

Orbit L

Dataset
Over 500 MB to 600 MB
Tokens
28.0M
Records
200,000
Line chars
20,000
Retention
7 days

Displayed dataset ranges are planning bands. Final tier assignment happens automatically after upload and validation, and the higher required tier wins when dataset size and token estimate point to different tiers.

Common eligibility failures

  • File is not a valid .jsonl dataset or is not UTF-8 encoded.
  • One or more lines are not valid JSON objects or do not use a supported record structure.
  • Dataset size exceeds BeaverYard's maximum published size cap.
  • Token estimate exceeds BeaverYard's maximum published token cap.
  • Record count exceeds the global 200,000 records/lines cap.
  • One or more lines exceed the global 20,000-character line-length cap.

Priority Processing

  • Priority Processing is available as an optional scheduling add-on at checkout.
  • Priority Processing is included automatically for BeaverYard Plus and Pro members at no extra cost.
  • It does not bypass dataset format or published tier size limits.

What we do not check

  • We do not score dataset quality or predict model performance.
  • We do not rewrite, deduplicate, or otherwise improve your dataset automatically.
  • Training quality remains your responsibility and depends on the data you upload.

Helpful links

Start Run