Cheapest AI model for a chatbot

Updated 2026-05-31 · By Clinton Patrick · Methodology

Chatbot turns are short on both ends — usually 100-300 input tokens and 50-150 output tokens. At this shape, input pricing dominates the bill. We ranked every non-deprecated model on that exact workload.

GPT-5 Nano is the cheapest model for this workload at $50 per 1M calls (OpenAI). Llama 3.1 8B is second at $54.

Ranked cheapest first

#	Model	Input $/M	Output $/M	Per 1M calls
#1	GPT-5 Nano OpenAI	$0.05	$0.40	$50
#2	Llama 3.1 8B Meta	$0.18	$0.18	$54
#3	GPT-4.1 Nano OpenAI	$0.10	$0.40	$60
#4	Gemini 2.5 Flash-Lite Google	$0.10	$0.40	$60
#5	GPT-4o mini OpenAI	$0.15	$0.60	$90
#6	DeepSeek V3 DeepSeek	$0.27	$1.10	$164
#7	GPT-5.4 Nano OpenAI	$0.20	$1.25	$165
#8	Llama 3.1 70B Meta	$0.59	$0.79	$197
#9	Gemini 3.1 Flash-Lite Google	$0.25	$1.50	$200
#10	GPT-4.1 Mini OpenAI	$0.40	$1.60	$240
#11	Qwen 2.5 Coder 32B Alibaba	$0.80	$0.80	$240
#12	GPT-5 Mini OpenAI	$0.25	$2.00	$250
#13	Llama 3.3 70B Meta	$0.88	$0.88	$264
#14	Qwen 2.5 72B Alibaba	$0.90	$0.90	$270
#15	DeepSeek V3.1 DeepSeek	$0.60	$1.70	$290
#16	Gemini 2.5 Flash Google	$0.30	$2.50	$310
#17	Gemini 3 Flash Google	$0.50	$3.00	$400
#18	GPT-5.4 Mini OpenAI	$0.75	$4.50	$600
#19	Qwen3 Coder 480B Alibaba	$2.00	$2.00	$600
#20	o3-mini OpenAI	$1.10	$4.40	$660
#21	o4-mini OpenAI	$1.10	$4.40	$660
#22	Claude Haiku 4.5 Anthropic	$1.00	$5.00	$700
#23	GLM-5.1 zhipu	$1.40	$4.40	$720
#24	Mistral Large Mistral	$2.00	$6.00	$1,000
#25	Llama 3.1 405B Meta	$3.50	$3.50	$1,050
#26	GPT-4.1 OpenAI	$2.00	$8.00	$1,200
#27	o3 OpenAI	$2.00	$8.00	$1,200
#28	GPT-5.1 OpenAI	$1.25	$10.00	$1,250
#29	GPT-5 OpenAI	$1.25	$10.00	$1,250
#30	Gemini 2.5 Pro Google	$1.25	$10.00	$1,250
#31	DeepSeek R1 DeepSeek	$3.00	$7.00	$1,300
#32	GPT-4o OpenAI	$2.50	$10.00	$1,500
#33	Gemini 3.1 Pro Google	$2.00	$12.00	$1,600
#34	GPT-5.3 OpenAI	$1.75	$14.00	$1,750
#35	GPT-5.2 OpenAI	$1.75	$14.00	$1,750
#36	GPT-5.4 OpenAI	$2.50	$15.00	$2,000
#37	Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$2,100
#38	Claude Opus 4.8 Anthropic	$5.00	$25.00	$3,500
#39	GPT-5.5 OpenAI	$5.00	$30.00	$4,000
#40	GPT-4 Turbo OpenAI	$10.00	$30.00	$5,000
#41	Claude Opus 4.8 (Fast Mode) Anthropic	$10.00	$50.00	$7,000
#42	o3-pro OpenAI	$20.00	$80.00	$12,000
#43	GPT-5 Pro OpenAI	$15.00	$120.00	$15,000
#44	GPT-5.2 Pro OpenAI	$21.00	$168.00	$21,000
#45	GPT-5.5 Pro OpenAI	$30.00	$180.00	$24,000
#46	GPT-5.4 Pro OpenAI	$30.00	$180.00	$24,000

Workload assumption: 200 input tokens + 100 output tokens per call, scaled to 1M calls. Pricing as of 2026-05-31.

How we computed this

We model a chatbot turn as 200 input tokens (the user message plus a compact system prompt and a couple turns of trimmed history) and 100 output tokens (a two-to-four sentence reply). Real chatbots vary: aggressive history-trimming pushes input under 150 tokens, while products that resend full conversation history every turn can hit 1,000+ input tokens by turn ten, which changes the ranking in favor of models with cheap input. If your bot resends history, check the RAG ranking instead since it benchmarks input-heavy shapes. Per-turn math: (input tokens × input price + output tokens × output price) ÷ 1M, then scaled to a million turns so the differences stop looking like rounding errors.

The math, worked through

One call at this workload costs GPT-5 Nano $0: 200 input tokens at $0.05 per million is $0, plus 100 output tokens at $0.40 per million is $0. At 10,000 calls a day that is $15 a month. The third-place model, GPT-4.1 Nano, runs 1.2x that. The most expensive model in the table, GPT-5.4 Pro, costs 480x the winner at the same workload: the spread between top and bottom of this ranking is not a rounding error, it is the difference between a tool budget and a headcount budget.

About the winner

Cheap chat models win this table on price per token, but the lowest-cost tier sometimes drops instruction-following quality on multi-turn tasks. If your bot needs to hold a persona, follow refund-policy rules, or call tools mid-conversation, benchmark the top three on your real transcripts before committing.

When not to pick the cheapest

Do not pick the table winner if your chatbot does anything safety-sensitive (medical, financial, legal triage). The cost difference between the cheapest model and a mid-tier one is usually under $200 per million turns, which is cheap insurance against a model that confidently hallucinates policy. Also check rate limits: some budget tiers throttle hard at production volumes.

How to use this ranking

The winner is mathematically cheapest at the listed workload shape — that's not the same as "best for the use case." Cheaper models often have lower reasoning depth, smaller context windows, or worse instruction-following. Use this as the cost baseline, then test the top 2-3 candidates on your real prompts via the live counter.

Pricing snapshots come from each provider's published rate cards and are tracked in the full pricing changelog. Tokenizer accuracy per model is documented in the methodology.

Other ranked use cases

Try the live counter →