Cheapest AI model for batch summarization
Summarizing documents at scale is the workload most punished by input pricing. Long input (often 4-16k tokens), short output (200-800 tokens). We picked 8,000 input + 500 output as the midpoint and ranked every model on that shape.
Ranked cheapest first
| # | Model | Input $/M | Output $/M | Per 1M calls |
|---|---|---|---|---|
| #1 | GPT-5 Nano OpenAI |
$0.05 | $0.40 | $600 |
| #2 | GPT-4.1 Nano OpenAI |
$0.10 | $0.40 | $1,000 |
| #3 | Gemini 2.5 Flash-Lite |
$0.10 | $0.40 | $1,000 |
| #4 | GPT-4o mini OpenAI |
$0.15 | $0.60 | $1,500 |
| #5 | Llama 3.1 8B Meta |
$0.18 | $0.18 | $1,530 |
| #6 | GPT-5.4 Nano OpenAI |
$0.20 | $1.25 | $2,225 |
| #7 | DeepSeek V3 DeepSeek |
$0.27 | $1.10 | $2,710 |
| #8 | Gemini 3.1 Flash-Lite |
$0.25 | $1.50 | $2,750 |
| #9 | GPT-5 Mini OpenAI |
$0.25 | $2.00 | $3,000 |
| #10 | Gemini 2.5 Flash |
$0.30 | $2.50 | $3,650 |
| #11 | GPT-4.1 Mini OpenAI |
$0.40 | $1.60 | $4,000 |
| #12 | Llama 3.1 70B Meta |
$0.59 | $0.79 | $5,115 |
| #13 | Gemini 3 Flash |
$0.50 | $3.00 | $5,500 |
| #14 | DeepSeek V3.1 DeepSeek |
$0.60 | $1.70 | $5,650 |
| #15 | Qwen 2.5 Coder 32B Alibaba |
$0.80 | $0.80 | $6,800 |
| #16 | Llama 3.3 70B Meta |
$0.88 | $0.88 | $7,480 |
| #17 | Qwen 2.5 72B Alibaba |
$0.90 | $0.90 | $7,650 |
| #18 | GPT-5.4 Mini OpenAI |
$0.75 | $4.50 | $8,250 |
| #19 | Claude Haiku 4.5 Anthropic |
$1.00 | $5.00 | $10,500 |
| #20 | o3-mini OpenAI |
$1.10 | $4.40 | $11,000 |
| #21 | o4-mini OpenAI |
$1.10 | $4.40 | $11,000 |
| #22 | GLM-5.1 zhipu |
$1.40 | $4.40 | $13,400 |
| #23 | GPT-5.1 OpenAI |
$1.25 | $10.00 | $15,000 |
| #24 | GPT-5 OpenAI |
$1.25 | $10.00 | $15,000 |
| #25 | Gemini 2.5 Pro |
$1.25 | $10.00 | $15,000 |
| #26 | Qwen3 Coder 480B Alibaba |
$2.00 | $2.00 | $17,000 |
| #27 | Mistral Large Mistral |
$2.00 | $6.00 | $19,000 |
| #28 | GPT-4.1 OpenAI |
$2.00 | $8.00 | $20,000 |
| #29 | o3 OpenAI |
$2.00 | $8.00 | $20,000 |
| #30 | GPT-5.3 OpenAI |
$1.75 | $14.00 | $21,000 |
| #31 | GPT-5.2 OpenAI |
$1.75 | $14.00 | $21,000 |
| #32 | Gemini 3.1 Pro |
$2.00 | $12.00 | $22,000 |
| #33 | GPT-4o OpenAI |
$2.50 | $10.00 | $25,000 |
| #34 | GPT-5.4 OpenAI |
$2.50 | $15.00 | $27,500 |
| #35 | DeepSeek R1 DeepSeek |
$3.00 | $7.00 | $27,500 |
| #36 | Llama 3.1 405B Meta |
$3.50 | $3.50 | $29,750 |
| #37 | Claude Sonnet 4.6 Anthropic |
$3.00 | $15.00 | $31,500 |
| #38 | Claude Opus 4.8 Anthropic |
$5.00 | $25.00 | $52,500 |
| #39 | GPT-5.5 OpenAI |
$5.00 | $30.00 | $55,000 |
| #40 | GPT-4 Turbo OpenAI |
$10.00 | $30.00 | $95,000 |
| #41 | Claude Opus 4.8 (Fast Mode) Anthropic |
$10.00 | $50.00 | $105,000 |
| #42 | GPT-5 Pro OpenAI |
$15.00 | $120.00 | $180,000 |
| #43 | o3-pro OpenAI |
$20.00 | $80.00 | $200,000 |
| #44 | GPT-5.2 Pro OpenAI |
$21.00 | $168.00 | $252,000 |
| #45 | GPT-5.5 Pro OpenAI |
$30.00 | $180.00 | $330,000 |
| #46 | GPT-5.4 Pro OpenAI |
$30.00 | $180.00 | $330,000 |
Workload assumption: 8,000 input tokens + 500 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.
How we computed this
At 8,000 input + 500 output, the input side is 16x the output side by volume, so input price is nearly the whole bill. An 8k-token document is roughly 10-12 pages of dense prose; if your documents run longer, chunk-then-merge summarization (summarize sections, then summarize the summaries) usually beats one giant call on both cost and quality. The other big lever is batch APIs: OpenAI, Anthropic, and Google all knock 50% off for asynchronous jobs with a 24-hour turnaround, and summarization pipelines are precisely the workload that tolerates that. The ranking below shows real-time pricing; halve the winner’s number if you can batch.
The math, worked through
One call at this workload costs GPT-5 Nano $0: 8,000 input tokens at $0.05 per million is $0, plus 500 output tokens at $0.40 per million is $0. At 10,000 calls a day that is $180 a month. The third-place model, Gemini 2.5 Flash-Lite, runs 1.7x that. The most expensive model in the table, GPT-5.4 Pro, costs 550x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.
About the winner
Budget models summarize fluently but compress lossily: they drop qualifiers, round numbers, and merge distinct claims. For news clipping or internal digests that is fine. For legal, medical, or financial document summarization, the cheapest model in this table is usually the wrong choice regardless of price.
When not to pick the cheapest
Watch two failure modes before scaling the winner: number scrambling (verify totals and dates survive the summary on a 50-document sample) and instruction drift on very long inputs (some models start ignoring format instructions past 6k input tokens). If either shows up, the next model up the price ladder is usually only 20-40% more expensive at this shape.
How to use this ranking
The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.
Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.