Partial Translation (Human + Machine)

The idea: Manually translate a representative sample, prove your machine method matches the human style on that sample, then auto-translate the remaining bulk. Combines human quality with machine scale — the human sets the standard, the machine follows it.

:::info This is a cookbook, not a finished implementation This guide sketches the hybrid human-machine workflow. It's especially relevant for translation agencies, community language workers, and educational contexts. :::

When to Use This

You have access to fluent speakers but their time is limited
You need to translate a large volume but only a small portion needs to be perfect
You want to establish a quality baseline with human translation, then scale with MT
You're working in an educational or community context where human review of a subset is feasible

How It Works

[Full corpus: 1,000 entries]
        │
        ├── [100 entries] ──→ Human translator ──→ Gold translations
        │                                              │
        │                                              ▼
        │                                    Train / prompt machine
        │                                    method to match style
        │                                              │
        └── [900 entries] ──→ Machine method ──→ Auto translations
                                                       │
                                                       ▼
                                              [Optional: human review
                                               of flagged entries]

Select a representative sample — cover different sentence types, lengths, and topics
Human-translate the sample — establish the gold standard for style, register, and terminology
Configure your machine method — use the human translations as coaching data, few-shot examples, or fine-tuning data
Score the machine on the human sample — does the machine match the human's style?
Auto-translate the rest — if machine quality is acceptable on the sample
Optional human review — flag low-confidence outputs for speaker review

Quality Assurance: The Style Match Test

# Translate the human-translated sample with your machine method
python eval/baseline_experiment.py \
  --dataset data/human-sample.json \
  --condition coached-v3

# Compare: does the machine match the human translator's choices?
# Look at: chrF++ (similarity), FST acceptance (validity),
# and qualitative patterns (register, formality, terminology)

Selecting the Sample

Cover the distribution. Your 100 entries should include:

Short phrases (1–3 words) and full sentences
Common vocabulary and domain-specific terms
Simple structures and complex ones
Multiple grammatical features (questions, imperatives, conditionals)

Don't cherry-pick easy ones. The sample must include entries your method is likely to struggle with — that's where human quality matters most.

The Community Review Workflow

For Indigenous language communities, this approach respects speaker time:

Speaker translates 50–100 entries (2–4 hours of focused work)
Machine translates the remaining 900 using speaker's work as coaching data
Speaker reviews flagged entries — only the ones the machine was least confident about (another 1–2 hours)
Result: 1,000 translations at near-human quality, with ~5 hours of speaker time instead of ~50

Pros and Cons


✅ Combines human quality with machine scale	❌ Requires initial human investment
✅ Respects limited speaker availability	❌ Machine may not capture all stylistic nuances
✅ Natural quality assurance workflow	❌ Sample selection affects overall quality
✅ Great for community/educational contexts	❌ Human review bottleneck for flagged entries

Combines Well With

Coached LLM Prompting — human translations inform the coaching data
Few-Shot Prompting — human translations as in-context examples
Corpus Creation — the human sample IS corpus creation

When to Use This​

How It Works​

Quality Assurance: The Style Match Test​

Selecting the Sample​

The Community Review Workflow​

Pros and Cons​

Combines Well With​

See Also​