#tHow Many Tokens?

← Back to counter

Cheapest AI model for a coding assistant

Coding assistants (Cursor-style autocomplete, function generation) typically read a small context window and generate a larger completion. We benchmarked at 500 input + 1,500 output, which represents typical autocomplete + small-function generation patterns.

Llama 3.1 8B is the cheapest model for this workload at $360 per 1M calls (Meta). GPT-5 Nano is second at $625.

Ranked cheapest first

#ModelInput $/MOutput $/MPer 1M calls
#1 Llama 3.1 8B
Meta
$0.18 $0.18 $360
#2 GPT-5 Nano
OpenAI
$0.05 $0.40 $625
#3 GPT-4.1 Nano
OpenAI
$0.10 $0.40 $650
#4 Gemini 2.5 Flash-Lite
Google
$0.10 $0.40 $650
#5 GPT-4o mini
OpenAI
$0.15 $0.60 $975
#6 Llama 3.1 70B
Meta
$0.59 $0.79 $1,480
#7 Qwen 2.5 Coder 32B
Alibaba
$0.80 $0.80 $1,600
#8 Llama 3.3 70B
Meta
$0.88 $0.88 $1,760
#9 DeepSeek V3
DeepSeek
$0.27 $1.10 $1,785
#10 Qwen 2.5 72B
Alibaba
$0.90 $0.90 $1,800
#11 GPT-5.4 Nano
OpenAI
$0.20 $1.25 $1,975
#12 Gemini 3.1 Flash-Lite
Google
$0.25 $1.50 $2,375
#13 GPT-4.1 Mini
OpenAI
$0.40 $1.60 $2,600
#14 DeepSeek V3.1
DeepSeek
$0.60 $1.70 $2,850
#15 GPT-5 Mini
OpenAI
$0.25 $2.00 $3,125
#16 Gemini 2.5 Flash
Google
$0.30 $2.50 $3,900
#17 Qwen3 Coder 480B
Alibaba
$2.00 $2.00 $4,000
#18 Gemini 3 Flash
Google
$0.50 $3.00 $4,750
#19 Llama 3.1 405B
Meta
$3.50 $3.50 $7,000
#20 GPT-5.4 Mini
OpenAI
$0.75 $4.50 $7,125
#21 o3-mini
OpenAI
$1.10 $4.40 $7,150
#22 o4-mini
OpenAI
$1.10 $4.40 $7,150
#23 GLM-5.1
zhipu
$1.40 $4.40 $7,300
#24 Claude Haiku 4.5
Anthropic
$1.00 $5.00 $8,000
#25 Mistral Large
Mistral
$2.00 $6.00 $10,000
#26 DeepSeek R1
DeepSeek
$3.00 $7.00 $12,000
#27 GPT-4.1
OpenAI
$2.00 $8.00 $13,000
#28 o3
OpenAI
$2.00 $8.00 $13,000
#29 GPT-5.1
OpenAI
$1.25 $10.00 $15,625
#30 GPT-5
OpenAI
$1.25 $10.00 $15,625
#31 Gemini 2.5 Pro
Google
$1.25 $10.00 $15,625
#32 GPT-4o
OpenAI
$2.50 $10.00 $16,250
#33 Gemini 3.1 Pro
Google
$2.00 $12.00 $19,000
#34 GPT-5.3
OpenAI
$1.75 $14.00 $21,875
#35 GPT-5.2
OpenAI
$1.75 $14.00 $21,875
#36 GPT-5.4
OpenAI
$2.50 $15.00 $23,750
#37 Claude Sonnet 4.6
Anthropic
$3.00 $15.00 $24,000
#38 Claude Opus 4.8
Anthropic
$5.00 $25.00 $40,000
#39 GPT-5.5
OpenAI
$5.00 $30.00 $47,500
#40 GPT-4 Turbo
OpenAI
$10.00 $30.00 $50,000
#41 Claude Opus 4.8 (Fast Mode)
Anthropic
$10.00 $50.00 $80,000
#42 o3-pro
OpenAI
$20.00 $80.00 $130,000
#43 GPT-5 Pro
OpenAI
$15.00 $120.00 $187,500
#44 GPT-5.2 Pro
OpenAI
$21.00 $168.00 $262,500
#45 GPT-5.5 Pro
OpenAI
$30.00 $180.00 $285,000
#46 GPT-5.4 Pro
OpenAI
$30.00 $180.00 $285,000

Workload assumption: 500 input tokens + 1,500 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.

How we computed this

The 1:3 input-to-output ratio is inverted from almost every other workload on this site, which reshuffles the ranking: models with cheap output climb, models that discount input but charge premium output rates fall. The 500-input figure assumes the assistant sends the current function plus a few signatures, not whole files. Note that modern agentic coding tools (Claude Code, Cursor agent mode) break this shape entirely: they read far more than they write, often 20:1 input-to-output, so if you are budgeting for an agent rather than an autocompleter, the code review and RAG rankings are closer to your real shape.

The math, worked through

One call at this workload costs Llama 3.1 8B $0: 500 input tokens at $0.18 per million is $0, plus 1,500 output tokens at $0.18 per million is $0. At 10,000 calls a day that is $108 a month. The third-place model, GPT-4.1 Nano, runs 1.8x that. The most expensive model in the table, GPT-5.4 Pro, costs 792x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.

About the winner

Output-heavy pricing favors budget models here, but generated code quality varies more across the price ladder than any other output type. A model that produces subtly wrong code costs you debugging time worth far more than the per-call savings. The sweet spot for most teams is the mid-tier: fast, cheap output with acceptable correctness on mainstream languages.

When not to pick the cheapest

Skip the bottom of the table for anything beyond boilerplate: cheap models autocomplete plausible-looking calls to functions that do not exist. Latency also matters more than price for autocomplete (a 3-second completion is a useless completion), and per-token price says nothing about time-to-first-token. Benchmark perceived speed, not just cost.

How to use this ranking

The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.

Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.

Other ranked use cases

Try the live counter →