Gemini 2.5 Flash: token counter & pricing
Google · exact (uses official tokenizer) · pricing as of 2026-05-31.
- Provider
- API model ID
gemini-2.5-flash- Context window
- 1,000,000 tokens
- Input price
- $0.30 per 1M tokens
- Output price
- $2.50 per 1M tokens
- Tokenizer accuracy
- exact (uses official tokenizer)
- Pricing as of
- 2026-05-31
Open the counter to count tokens for Gemini 2.5 Flash in real time.
What is Gemini 2.5 Flash?
Gemini 2.5 Flash is Google's small-and-cheap general-purpose workhorse at $0.30 per million input tokens / $2.50 output, with exact token counts in this counter. 1-million-token context window. Multimodal in (text, image, video, audio).
How tokens are counted here
Counts for Gemini 2.5 Flash come straight from Google's models.countTokens API, relayed through our serverless proxy, and are exact. At Flash-style volumes that exactness is operationally useful: if you enforce per-request token budgets or trim RAG context to a target size, you are trimming against the same numbers Google bills, not an approximation that drifts a few percent either way.
Why Gemini Flash is winning high-volume workloads
The combination is unique:
- Cheap. $0.30/$2.50 per million. GPT-4o mini lists lower at $0.15/$0.60, but Flash's $0.03/M cached-input rate flips the math for prompts with large shared prefixes.
- 1M-token context. Most competitors top out at 128k-200k.
- Strong multimodal. Image understanding is competitive with frontier models at fraction of the cost.
- Fast. Comparable latency to GPT-4o mini and Haiku.
If your workload is *anything* you'd consider running on GPT-4o mini or Claude Haiku, Gemini Flash should be in the eval. The price gap is large enough that switching pays for the migration effort within weeks at any non-trivial volume.
When NOT to use Flash
- Frontier reasoning tasks. Flash is small. Multi-step agentic workflows often want Pro or Claude Sonnet.
- Workloads requiring strict tool-use determinism. Function-calling reliability is improving but trails OpenAI's structured outputs.
- Anywhere you've validated Claude or GPT and don't want to re-eval.
Pricing notes
The $0.30 input rate is the standard tier from ai.google.dev/pricing, with cached input at $0.03 per million. If you regularly send very long contexts, verify current long-context rates on Google's pricing page. This calculator assumes the standard tier.
What Gemini 2.5 Flash costs in production
Consider a support bot doing retrieval-augmented answers: 200,000 queries a month, each stuffing about 6,000 tokens of retrieved chunks plus the question into the prompt and returning a 300-token answer. Input: 1.2B tokens × $0.30 = $360. Output: 60M tokens × $2.50 = $150. Call it $510 a month.
GPT-4o mini would run about $216 on list price ($180 input + $36 output) for the same traffic, so eval it if quality holds. Claude Haiku 4.5 at $1/$5 would run about $1,500 ($1,200 + $300). Neither figure assumes caching: if your system prompt and instructions are a fixed prefix, Flash's $0.03/M cached-input rate cuts the input line hard, and at 200k queries a month that discount is worth real money.
Migrating from Gemini 1.5 Flash
Gemini 1.5 Flash workloads should land here: swap the model string to apiId gemini-2.5-flash and the SDK surface is unchanged, including multimodal inputs and the countTokens endpoint. Re-run your cost projections rather than carrying 1.5-era numbers forward, since rates differ and the cached-input discount changes which architectures are cheapest. Token accuracy is the easy part of this migration: this page queries the live countTokens endpoint for the model you select, so the counts already reflect 2.5 Flash rather than an estimate tuned to the old tokenizer.
Gemini 2.5 Flash vs the obvious alternative
Claude Haiku 4.5 is the closest cross-vendor rival: $1/$5 against Flash's $0.30/$2.50, so Haiku costs over 3× more on input and 2× more on output. Haiku counters with Anthropic's instruction-following and a mature tool-use stack; Flash counters with a 1M context window (vs 200k) and broader multimodal input. Head-to-head numbers: Claude Haiku vs Gemini 2.5 Flash. Deciding between Flash and Pro instead? See Gemini 2.5 Flash vs Pro.
Common questions
How does Flash compare to Gemini 2.5 Pro?
Pro is $1.25/$10 input/output (vs Flash's $0.30/$2.50). Pro costs roughly 4× more per input token. Use Pro when reasoning quality measurably matters; Flash for everything else.
Is the count_tokens endpoint free?
Yes. Google's countTokens is separate from generation billing. Our proxy adds a 30-day cache so we don't burn quota on repeated identical inputs.
How does Flash's tokenizer compare to GPT or Claude?
Gemini tends to produce slightly fewer tokens than GPT-4o or Claude for the same English text, usually a few percent. The difference can be larger for code or non-English content. The calculator above shows the actual count for your input.
Compare Gemini 2.5 Flash to other models
- Gemini 3.1 Pro (Google, $2.00/$12.00)
- Gemini 3 Flash (Google, $0.50/$3.00)
- Gemini 3.1 Flash-Lite (Google, $0.25/$1.50)
- Gemini 2.5 Pro (Google, $1.25/$10.00)
- Gemini 2.5 Flash-Lite (Google, $0.10/$0.40)
- DeepSeek V3 (DeepSeek, $0.27/$1.10)
- GPT-5 Mini (OpenAI, $0.25/$2.00)
- GPT-5.4 Nano (OpenAI, $0.20/$1.25)