Submit a Method

Executive Summary. A step-by-step quickstart for submitting your first benchmark run to the leaderboard. Clone the harness, run it against a dataset, review your run card, and submit. Takes 10 minutes if you have an API key.

This guide walks you through submitting your first benchmark run to the MT Eval Arena leaderboard.

Prerequisites

Python 3.10+
An OpenRouter API key (or equivalent for your model provider)
A translation method — anything that produces translations from a source text

# Clone the eval harness
git clone https://github.com/gamedaysuits/gds-mt-eval-harness.git
cd gds-mt-eval-harness
pip install sacrebleu aiohttp

Step 1: Run the Harness

The harness scores your method against a standardized dataset:

python eval/baseline_experiment.py \
  --dataset data/edtekla-dev-v1.json \
  --model google/gemini-2.5-pro \
  --condition your-method-name \
  --temperature 0.2

Flag	What It Does
`--dataset`	Path to the evaluation dataset JSON
`--model`	OpenRouter model slug
`--condition`	Label for your method (appears on leaderboard)
`--temperature`	Sampling temperature (lower = more deterministic)
`--fst-analyzer`	Optional: path to FST binary for morphological validation
`--submit`	Auto-submit the run card to the leaderboard

The harness produces a run card — a self-contained JSON file with your scores, the dataset hash, the model slug, and a cryptographic fingerprint tying results to the exact experiment configuration.

Step 2: Review Your Run Card

Run cards are saved to results/. Inspect yours before submitting:

cat results/your-run-card.json | python -m json.tool

Key fields to check:

scores.chrf_plus_plus — your primary quality metric
scores.exact_match_rate — proportion of perfect translations
scores.fst_acceptance_rate — morphological validity (if FST was used)
totals.total_cost_usd — what the run cost
fingerprint — the experiment's reproducibility hash

See the Run Card Specification for the full schema.

Step 3: Submit

Automatic submission

If you passed --submit when running the harness, your run card was already uploaded.

Manual submission

Submit any run card via the API:

curl -X POST https://mtevalarena.org/api/leaderboard/submit \
  -H "Content-Type: application/json" \
  -d @results/your-run-card.json

Or upload through the Leaderboard UI.

What Happens Next

Your submission is validated (dataset hash, run card integrity)
Results appear on the leaderboard as Self-benchmarked (trust tier 1)
To get GDS Verified status, submit your method as an installable plugin so maintainers can reproduce your results
For Indigenous language methods: if your method reaches the top, the ownership transfer process begins

Prerequisites​

Step 1: Run the Harness​

Step 2: Review Your Run Card​

Step 3: Submit​

Automatic submission​

Manual submission​

What Happens Next​

See Also​