Cheapest AI model for translating documents

Updated 2026-05-31 · By Clinton Patrick · Methodology

Translation has an unusual cost profile: output volume roughly equals input volume, so output pricing matters as much as input. We ranked every model at 1,000 input + 1,000 output, the cleanest 1:1 benchmark.

Llama 3.1 8B is the cheapest model for this workload at $360 per 1M calls (Meta). GPT-5 Nano is second at $450.

Ranked cheapest first

#	Model	Input $/M	Output $/M	Per 1M calls
#1	Llama 3.1 8B Meta	$0.18	$0.18	$360
#2	GPT-5 Nano OpenAI	$0.05	$0.40	$450
#3	GPT-4.1 Nano OpenAI	$0.10	$0.40	$500
#4	Gemini 2.5 Flash-Lite Google	$0.10	$0.40	$500
#5	GPT-4o mini OpenAI	$0.15	$0.60	$750
#6	DeepSeek V3 DeepSeek	$0.27	$1.10	$1,370
#7	Llama 3.1 70B Meta	$0.59	$0.79	$1,380
#8	GPT-5.4 Nano OpenAI	$0.20	$1.25	$1,450
#9	Qwen 2.5 Coder 32B Alibaba	$0.80	$0.80	$1,600
#10	Gemini 3.1 Flash-Lite Google	$0.25	$1.50	$1,750
#11	Llama 3.3 70B Meta	$0.88	$0.88	$1,760
#12	Qwen 2.5 72B Alibaba	$0.90	$0.90	$1,800
#13	GPT-4.1 Mini OpenAI	$0.40	$1.60	$2,000
#14	GPT-5 Mini OpenAI	$0.25	$2.00	$2,250
#15	DeepSeek V3.1 DeepSeek	$0.60	$1.70	$2,300
#16	Gemini 2.5 Flash Google	$0.30	$2.50	$2,800
#17	Gemini 3 Flash Google	$0.50	$3.00	$3,500
#18	Qwen3 Coder 480B Alibaba	$2.00	$2.00	$4,000
#19	GPT-5.4 Mini OpenAI	$0.75	$4.50	$5,250
#20	o3-mini OpenAI	$1.10	$4.40	$5,500
#21	o4-mini OpenAI	$1.10	$4.40	$5,500
#22	GLM-5.1 zhipu	$1.40	$4.40	$5,800
#23	Claude Haiku 4.5 Anthropic	$1.00	$5.00	$6,000
#24	Llama 3.1 405B Meta	$3.50	$3.50	$7,000
#25	Mistral Large Mistral	$2.00	$6.00	$8,000
#26	GPT-4.1 OpenAI	$2.00	$8.00	$10,000
#27	o3 OpenAI	$2.00	$8.00	$10,000
#28	DeepSeek R1 DeepSeek	$3.00	$7.00	$10,000
#29	GPT-5.1 OpenAI	$1.25	$10.00	$11,250
#30	GPT-5 OpenAI	$1.25	$10.00	$11,250
#31	Gemini 2.5 Pro Google	$1.25	$10.00	$11,250
#32	GPT-4o OpenAI	$2.50	$10.00	$12,500
#33	Gemini 3.1 Pro Google	$2.00	$12.00	$14,000
#34	GPT-5.3 OpenAI	$1.75	$14.00	$15,750
#35	GPT-5.2 OpenAI	$1.75	$14.00	$15,750
#36	GPT-5.4 OpenAI	$2.50	$15.00	$17,500
#37	Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$18,000
#38	Claude Opus 4.8 Anthropic	$5.00	$25.00	$30,000
#39	GPT-5.5 OpenAI	$5.00	$30.00	$35,000
#40	GPT-4 Turbo OpenAI	$10.00	$30.00	$40,000
#41	Claude Opus 4.8 (Fast Mode) Anthropic	$10.00	$50.00	$60,000
#42	o3-pro OpenAI	$20.00	$80.00	$100,000
#43	GPT-5 Pro OpenAI	$15.00	$120.00	$135,000
#44	GPT-5.2 Pro OpenAI	$21.00	$168.00	$189,000
#45	GPT-5.5 Pro OpenAI	$30.00	$180.00	$210,000
#46	GPT-5.4 Pro OpenAI	$30.00	$180.00	$210,000

Workload assumption: 1,000 input tokens + 1,000 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.

How we computed this

The 1:1 ratio is an idealization; real ratios depend on the language pair. English to German runs about 1:1.1 in tokens, English to Japanese or Chinese can run 1:1.5 or worse because CJK text tokenizes less efficiently on most Western-trained tokenizers. That tokenizer effect also hits the input side when translating FROM CJK languages, so the same document costs meaningfully different amounts depending on direction. Since output prices run 3-5x input prices on most models, the output-heavy nature of translation means the output column in the table below is the number to watch, not the input column.

The math, worked through

One call at this workload costs Llama 3.1 8B $0: 1,000 input tokens at $0.18 per million is $0, plus 1,000 output tokens at $0.18 per million is $0. At 10,000 calls a day that is $108 a month. The third-place model, GPT-4.1 Nano, runs 1.4x that. The most expensive model in the table, GPT-5.4 Pro, costs 583x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.

About the winner

Cheap models translate common language pairs (EN↔ES, EN↔FR, EN↔DE) shockingly well, so for high-resource pairs the table winner is usually a fine choice. The gap opens on low-resource languages, idiom, and domain terminology (legal, medical), where frontier models hold a real edge.

When not to pick the cheapest

Do not use the cheapest model for content where mistranslation carries liability (contracts, safety instructions, medical leaflets) or for marketing copy where tone is the product. And for high-volume localization pipelines, compare against dedicated MT APIs (DeepL, Google Translate): below a quality threshold, purpose-built MT is still cheaper per character than any LLM.

How to use this ranking

The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.

Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.

Other ranked use cases

Try the live counter →