Cheapest reasoning AI model

Updated 2026-05-31 · By Clinton Patrick · Methodology

Reasoning models (OpenAI o-series, DeepSeek R1, Claude thinking variants) burn output tokens internally before generating their reply. We benchmarked at 500 input + 3,000 output, which captures the reality of reasoning output volume. Note: not every model below is a true reasoning model — the ranking simply shows which models are cheapest at that output-heavy shape.

Llama 3.1 8B is the cheapest model for this workload at $630 per 1M calls (Meta). GPT-5 Nano is second at $1,225.

Ranked cheapest first

#	Model	Input $/M	Output $/M	Per 1M calls
#1	Llama 3.1 8B Meta	$0.18	$0.18	$630
#2	GPT-5 Nano OpenAI	$0.05	$0.40	$1,225
#3	GPT-4.1 Nano OpenAI	$0.10	$0.40	$1,250
#4	Gemini 2.5 Flash-Lite Google	$0.10	$0.40	$1,250
#5	GPT-4o mini OpenAI	$0.15	$0.60	$1,875
#6	Llama 3.1 70B Meta	$0.59	$0.79	$2,665
#7	Qwen 2.5 Coder 32B Alibaba	$0.80	$0.80	$2,800
#8	Llama 3.3 70B Meta	$0.88	$0.88	$3,080
#9	Qwen 2.5 72B Alibaba	$0.90	$0.90	$3,150
#10	DeepSeek V3 DeepSeek	$0.27	$1.10	$3,435
#11	GPT-5.4 Nano OpenAI	$0.20	$1.25	$3,850
#12	Gemini 3.1 Flash-Lite Google	$0.25	$1.50	$4,625
#13	GPT-4.1 Mini OpenAI	$0.40	$1.60	$5,000
#14	DeepSeek V3.1 DeepSeek	$0.60	$1.70	$5,400
#15	GPT-5 Mini OpenAI	$0.25	$2.00	$6,125
#16	Qwen3 Coder 480B Alibaba	$2.00	$2.00	$7,000
#17	Gemini 2.5 Flash Google	$0.30	$2.50	$7,650
#18	Gemini 3 Flash Google	$0.50	$3.00	$9,250
#19	Llama 3.1 405B Meta	$3.50	$3.50	$12,250
#20	o3-mini OpenAI	$1.10	$4.40	$13,750
#21	o4-mini OpenAI	$1.10	$4.40	$13,750
#22	GPT-5.4 Mini OpenAI	$0.75	$4.50	$13,875
#23	GLM-5.1 zhipu	$1.40	$4.40	$13,900
#24	Claude Haiku 4.5 Anthropic	$1.00	$5.00	$15,500
#25	Mistral Large Mistral	$2.00	$6.00	$19,000
#26	DeepSeek R1 DeepSeek	$3.00	$7.00	$22,500
#27	GPT-4.1 OpenAI	$2.00	$8.00	$25,000
#28	o3 OpenAI	$2.00	$8.00	$25,000
#29	GPT-5.1 OpenAI	$1.25	$10.00	$30,625
#30	GPT-5 OpenAI	$1.25	$10.00	$30,625
#31	Gemini 2.5 Pro Google	$1.25	$10.00	$30,625
#32	GPT-4o OpenAI	$2.50	$10.00	$31,250
#33	Gemini 3.1 Pro Google	$2.00	$12.00	$37,000
#34	GPT-5.3 OpenAI	$1.75	$14.00	$42,875
#35	GPT-5.2 OpenAI	$1.75	$14.00	$42,875
#36	GPT-5.4 OpenAI	$2.50	$15.00	$46,250
#37	Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$46,500
#38	Claude Opus 4.8 Anthropic	$5.00	$25.00	$77,500
#39	GPT-5.5 OpenAI	$5.00	$30.00	$92,500
#40	GPT-4 Turbo OpenAI	$10.00	$30.00	$95,000
#41	Claude Opus 4.8 (Fast Mode) Anthropic	$10.00	$50.00	$155,000
#42	o3-pro OpenAI	$20.00	$80.00	$250,000
#43	GPT-5 Pro OpenAI	$15.00	$120.00	$367,500
#44	GPT-5.2 Pro OpenAI	$21.00	$168.00	$514,500
#45	GPT-5.5 Pro OpenAI	$30.00	$180.00	$555,000
#46	GPT-5.4 Pro OpenAI	$30.00	$180.00	$555,000

Workload assumption: 500 input tokens + 3,000 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.

How we computed this

The 3,000-output figure is the sum of hidden thinking tokens plus the visible answer. This is the part teams underestimate: a reasoning model can burn 2,000-10,000 thinking tokens on a hard problem before writing a single visible word, and every one of those bills at the full output rate. The same prompt can also cost wildly different amounts run-to-run, because thinking length varies with problem difficulty. Budget for the p95 thinking length, not the average. Where providers expose effort or thinking-budget controls, capping them is the single biggest cost lever: a low-effort setting often cuts cost 60-80% with modest accuracy loss on routine problems.

The math, worked through

One call at this workload costs Llama 3.1 8B $0: 500 input tokens at $0.18 per million is $0, plus 3,000 output tokens at $0.18 per million is $0. At 10,000 calls a day that is $189 a month. The third-place model, GPT-4.1 Nano, runs 2x that. The most expensive model in the table, GPT-5.4 Pro, costs 881x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.

About the winner

Non-reasoning models appear in this table because the math is shape-based, but they will not produce reasoning-grade answers at any price. Among the true reasoning models, DeepSeek R1 has historically punched far above its price tier, making it the default budget pick, with the o-series and Claude thinking variants as the step-up options.

When not to pick the cheapest

Do not route every request through a reasoning model. The standard pattern is a cheap classifier in front: simple queries go to a fast standard model, hard ones escalate to the reasoning tier. Teams that skip this routinely pay 10x more than necessary because 80% of production prompts never needed extended thinking in the first place.

How to use this ranking

The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.

Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.

Other ranked use cases

Try the live counter →