Cheapest AI model for a coding assistant
Coding assistants (Cursor-style autocomplete, function generation) typically read a small context window and generate a larger completion. We benchmarked at 500 input + 1,500 output, which represents typical autocomplete + small-function generation patterns.
Ranked cheapest first
| # | Model | Input $/M | Output $/M | Per 1M calls |
|---|---|---|---|---|
| #1 | Llama 3.1 8B Meta |
$0.18 | $0.18 | $360 |
| #2 | GPT-5 Nano OpenAI |
$0.05 | $0.40 | $625 |
| #3 | GPT-4.1 Nano OpenAI |
$0.10 | $0.40 | $650 |
| #4 | Gemini 2.5 Flash-Lite |
$0.10 | $0.40 | $650 |
| #5 | GPT-4o mini OpenAI |
$0.15 | $0.60 | $975 |
| #6 | Llama 3.1 70B Meta |
$0.59 | $0.79 | $1,480 |
| #7 | Qwen 2.5 Coder 32B Alibaba |
$0.80 | $0.80 | $1,600 |
| #8 | Llama 3.3 70B Meta |
$0.88 | $0.88 | $1,760 |
| #9 | DeepSeek V3 DeepSeek |
$0.27 | $1.10 | $1,785 |
| #10 | Qwen 2.5 72B Alibaba |
$0.90 | $0.90 | $1,800 |
| #11 | GPT-5.4 Nano OpenAI |
$0.20 | $1.25 | $1,975 |
| #12 | Gemini 3.1 Flash-Lite |
$0.25 | $1.50 | $2,375 |
| #13 | GPT-4.1 Mini OpenAI |
$0.40 | $1.60 | $2,600 |
| #14 | DeepSeek V3.1 DeepSeek |
$0.60 | $1.70 | $2,850 |
| #15 | GPT-5 Mini OpenAI |
$0.25 | $2.00 | $3,125 |
| #16 | Gemini 2.5 Flash |
$0.30 | $2.50 | $3,900 |
| #17 | Qwen3 Coder 480B Alibaba |
$2.00 | $2.00 | $4,000 |
| #18 | Gemini 3 Flash |
$0.50 | $3.00 | $4,750 |
| #19 | Llama 3.1 405B Meta |
$3.50 | $3.50 | $7,000 |
| #20 | GPT-5.4 Mini OpenAI |
$0.75 | $4.50 | $7,125 |
| #21 | o3-mini OpenAI |
$1.10 | $4.40 | $7,150 |
| #22 | o4-mini OpenAI |
$1.10 | $4.40 | $7,150 |
| #23 | GLM-5.1 zhipu |
$1.40 | $4.40 | $7,300 |
| #24 | Claude Haiku 4.5 Anthropic |
$1.00 | $5.00 | $8,000 |
| #25 | Mistral Large Mistral |
$2.00 | $6.00 | $10,000 |
| #26 | DeepSeek R1 DeepSeek |
$3.00 | $7.00 | $12,000 |
| #27 | GPT-4.1 OpenAI |
$2.00 | $8.00 | $13,000 |
| #28 | o3 OpenAI |
$2.00 | $8.00 | $13,000 |
| #29 | GPT-5.1 OpenAI |
$1.25 | $10.00 | $15,625 |
| #30 | GPT-5 OpenAI |
$1.25 | $10.00 | $15,625 |
| #31 | Gemini 2.5 Pro |
$1.25 | $10.00 | $15,625 |
| #32 | GPT-4o OpenAI |
$2.50 | $10.00 | $16,250 |
| #33 | Gemini 3.1 Pro |
$2.00 | $12.00 | $19,000 |
| #34 | GPT-5.3 OpenAI |
$1.75 | $14.00 | $21,875 |
| #35 | GPT-5.2 OpenAI |
$1.75 | $14.00 | $21,875 |
| #36 | GPT-5.4 OpenAI |
$2.50 | $15.00 | $23,750 |
| #37 | Claude Sonnet 4.6 Anthropic |
$3.00 | $15.00 | $24,000 |
| #38 | Claude Opus 4.8 Anthropic |
$5.00 | $25.00 | $40,000 |
| #39 | GPT-5.5 OpenAI |
$5.00 | $30.00 | $47,500 |
| #40 | GPT-4 Turbo OpenAI |
$10.00 | $30.00 | $50,000 |
| #41 | Claude Opus 4.8 (Fast Mode) Anthropic |
$10.00 | $50.00 | $80,000 |
| #42 | o3-pro OpenAI |
$20.00 | $80.00 | $130,000 |
| #43 | GPT-5 Pro OpenAI |
$15.00 | $120.00 | $187,500 |
| #44 | GPT-5.2 Pro OpenAI |
$21.00 | $168.00 | $262,500 |
| #45 | GPT-5.5 Pro OpenAI |
$30.00 | $180.00 | $285,000 |
| #46 | GPT-5.4 Pro OpenAI |
$30.00 | $180.00 | $285,000 |
Workload assumption: 500 input tokens + 1,500 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.
How we computed this
The 1:3 input-to-output ratio is inverted from almost every other workload on this site, which reshuffles the ranking: models with cheap output climb, models that discount input but charge premium output rates fall. The 500-input figure assumes the assistant sends the current function plus a few signatures, not whole files. Note that modern agentic coding tools (Claude Code, Cursor agent mode) break this shape entirely: they read far more than they write, often 20:1 input-to-output, so if you are budgeting for an agent rather than an autocompleter, the code review and RAG rankings are closer to your real shape.
The math, worked through
One call at this workload costs Llama 3.1 8B $0: 500 input tokens at $0.18 per million is $0, plus 1,500 output tokens at $0.18 per million is $0. At 10,000 calls a day that is $108 a month. The third-place model, GPT-4.1 Nano, runs 1.8x that. The most expensive model in the table, GPT-5.4 Pro, costs 792x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.
About the winner
Output-heavy pricing favors budget models here, but generated code quality varies more across the price ladder than any other output type. A model that produces subtly wrong code costs you debugging time worth far more than the per-call savings. The sweet spot for most teams is the mid-tier: fast, cheap output with acceptable correctness on mainstream languages.
When not to pick the cheapest
Skip the bottom of the table for anything beyond boilerplate: cheap models autocomplete plausible-looking calls to functions that do not exist. Latency also matters more than price for autocomplete (a 3-second completion is a useless completion), and per-token price says nothing about time-to-first-token. Benchmark perceived speed, not just cost.
How to use this ranking
The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.
Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.