Cheapest long-context AI model

Updated 2026-05-31 · By Clinton Patrick · Methodology

Long context unlocks codebase-scale analysis, multi-document Q&A, and large agent state. We filtered to models with at least 100,000-token context windows, then ranked by a realistic 50,000 input + 1,000 output workload.

GPT-5 Nano is the cheapest model for this workload at $2,900 per 1M calls (OpenAI). GPT-4.1 Nano is second at $5,400.

Ranked cheapest first

#	Model	Input $/M	Output $/M	Per 1M calls
#1	GPT-5 Nano OpenAI	$0.05	$0.40	$2,900
#2	GPT-4.1 Nano OpenAI	$0.10	$0.40	$5,400
#3	Gemini 2.5 Flash-Lite Google	$0.10	$0.40	$5,400
#4	GPT-4o mini OpenAI	$0.15	$0.60	$8,100
#5	Llama 3.1 8B Meta	$0.18	$0.18	$9,180
#6	GPT-5.4 Nano OpenAI	$0.20	$1.25	$11,250
#7	Gemini 3.1 Flash-Lite Google	$0.25	$1.50	$14,000
#8	GPT-5 Mini OpenAI	$0.25	$2.00	$14,500
#9	DeepSeek V3 DeepSeek	$0.27	$1.10	$14,600
#10	Gemini 2.5 Flash Google	$0.30	$2.50	$17,500
#11	GPT-4.1 Mini OpenAI	$0.40	$1.60	$21,600
#12	Gemini 3 Flash Google	$0.50	$3.00	$28,000
#13	Llama 3.1 70B Meta	$0.59	$0.79	$30,290
#14	DeepSeek V3.1 DeepSeek	$0.60	$1.70	$31,700
#15	Qwen 2.5 Coder 32B Alibaba	$0.80	$0.80	$40,800
#16	GPT-5.4 Mini OpenAI	$0.75	$4.50	$42,000
#17	Llama 3.3 70B Meta	$0.88	$0.88	$44,880
#18	Qwen 2.5 72B Alibaba	$0.90	$0.90	$45,900
#19	Claude Haiku 4.5 Anthropic	$1.00	$5.00	$55,000
#20	o3-mini OpenAI	$1.10	$4.40	$59,400
#21	o4-mini OpenAI	$1.10	$4.40	$59,400
#22	GPT-5.1 OpenAI	$1.25	$10.00	$72,500
#23	GPT-5 OpenAI	$1.25	$10.00	$72,500
#24	Gemini 2.5 Pro Google	$1.25	$10.00	$72,500
#25	GLM-5.1 zhipu	$1.40	$4.40	$74,400
#26	GPT-5.3 OpenAI	$1.75	$14.00	$101,500
#27	GPT-5.2 OpenAI	$1.75	$14.00	$101,500
#28	Qwen3 Coder 480B Alibaba	$2.00	$2.00	$102,000
#29	Mistral Large Mistral	$2.00	$6.00	$106,000
#30	GPT-4.1 OpenAI	$2.00	$8.00	$108,000
#31	o3 OpenAI	$2.00	$8.00	$108,000
#32	Gemini 3.1 Pro Google	$2.00	$12.00	$112,000
#33	GPT-4o OpenAI	$2.50	$10.00	$135,000
#34	GPT-5.4 OpenAI	$2.50	$15.00	$140,000
#35	DeepSeek R1 DeepSeek	$3.00	$7.00	$157,000
#36	Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$165,000
#37	Llama 3.1 405B Meta	$3.50	$3.50	$178,500
#38	Claude Opus 4.8 Anthropic	$5.00	$25.00	$275,000
#39	GPT-5.5 OpenAI	$5.00	$30.00	$280,000
#40	GPT-4 Turbo OpenAI	$10.00	$30.00	$530,000
#41	Claude Opus 4.8 (Fast Mode) Anthropic	$10.00	$50.00	$550,000
#42	GPT-5 Pro OpenAI	$15.00	$120.00	$870,000
#43	o3-pro OpenAI	$20.00	$80.00	$1,080,000
#44	GPT-5.2 Pro OpenAI	$21.00	$168.00	$1,218,000
#45	GPT-5.5 Pro OpenAI	$30.00	$180.00	$1,680,000
#46	GPT-5.4 Pro OpenAI	$30.00	$180.00	$1,680,000

Workload assumption: 50,000 input tokens + 1,000 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.

How we computed this

At 50,000 input tokens per call, this is the most expensive workload shape on the site by an order of magnitude: input cost is 50x the chatbot benchmark before output even enters the math. That scale changes which optimizations matter. Prompt caching is no longer nice-to-have, it is the difference between viable and not: a cached 50k-token corpus re-queried 100 times costs a fraction of 100 cold calls. Some providers also price long-context calls at a premium tier above a threshold (often 128k or 200k tokens), which this table does not capture; check the provider rate card before betting on the 1M-window models at full depth.

The math, worked through

One call at this workload costs GPT-5 Nano $0: 50,000 input tokens at $0.05 per million is $0, plus 1,000 output tokens at $0.40 per million is $0. At 10,000 calls a day that is $870 a month. The third-place model, Gemini 2.5 Flash-Lite, runs 1.9x that. The most expensive model in the table, GPT-5.4 Pro, costs 579x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.

About the winner

Advertised window size and effective recall are different properties. Most models show measurable degradation retrieving facts from the middle of very long contexts (the "lost in the middle" effect), so the cheapest 100k+ model may technically accept your input while quietly missing what is in it. For needle-in-haystack work, test recall at your real depth before choosing on price.

When not to pick the cheapest

If you are reaching for long context to avoid building retrieval, run the math first: 1,000 RAG calls at 4k input cost less than 100 long-context calls at 50k input on every model in this table. Long context wins when the task genuinely needs global reasoning over the whole input (cross-document contradiction finding, whole-codebase refactor planning), not when it is standing in for a search index.

How to use this ranking

The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.

Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.

Other ranked use cases

Try the live counter →