o3: token counter & pricing

OpenAI · exact (uses official tokenizer) · pricing as of 2026-05-31.

Updated 2026-05-31 · By Clinton Patrick · Methodology

Provider: OpenAI
API model ID: o3
Context window: 200,000 tokens
Input price: $2.00 per 1M tokens
Output price: $8.00 per 1M tokens
Tokenizer accuracy: exact (uses official tokenizer)
Pricing as of: 2026-05-31

Open the counter to count tokens for o3 in real time.

What is o3?

o3 is OpenAI's reasoning-tier model, designed to think longer before responding, which produces measurably better answers on hard logic, math, and code problems. Not a chat model in the GPT-5 sense. Use o3 when the answer matters more than the response time.

How tokens are counted here

Unlike the Claude pages on this site, which call Anthropic's count_tokens API through a proxy, o3 is counted entirely in your browser. OpenAI publishes the o200k_base tokenizer (shared with the GPT-4o and GPT-5 families), so we run it locally via js-tiktoken: your text never leaves the page, and the count is exact.

But: the o-series produces a lot of "reasoning tokens", internal chain-of-thought tokens that count toward your output bill but don't appear in the final response. A 200-token reply might consume 1,500-3,000 output tokens of internal reasoning. The calculator above shows the visible output count; actual billed output is typically 5-15× higher.

Pricing notes

$2.00 input / $8.00 output per 1M tokens. Cached input $0.50/M.

If you're estimating total cost on o3, plan for the reasoning-token multiplier. A workload that looks like 1,000 input + 500 visible output tokens might bill closer to 1,000 input + 4,000 output tokens.

What o3 costs in production

Consider a nightly batch job that runs 2,000 hard items (math proofs, algorithm bugs, eval questions) through o3, each with 1,000 input tokens and about 500 visible output tokens. With the reasoning multiplier, assume billed output lands near 4,000 tokens per item:

Input: 2,000 × 1,000 = 2M tokens/night, or 60M tokens over a 30-day month
Billed output: 2,000 × 4,000 = 8M tokens/night, or 240M tokens/month
Cost: 60M × $2 = $120 input, plus 240M × $8 = $1,920 output. About $2,040/month.

Note where the money goes: 94% of the bill is output, almost all of it invisible reasoning tokens. The same job on o4-mini ($1.10/$4.40) bills $66 + $1,056 = $1,122/month, roughly half. On o3-pro ($20/$80) it bills $1,200 + $19,200 = $20,400/month, a 10x jump that only makes sense if single-shot accuracy on the hardest items is worth it.

Migrating from o1

The model string change is trivial (o1 to o3 in your API calls); the economics are not. o3 launched at $2/$8, dramatically cheaper than the o1 tier it replaced, which flipped the default advice from "use reasoning models sparingly" to "use them wherever accuracy is the constraint." Two things carry over from o1 unchanged: the o200k tokenizer, so input counts do not move, and the reasoning-token billing model, so your output costs remain several times the visible reply. The reasoning_effort parameter gives you more control than o1 offered; dial it down before assuming a workload is too expensive.

o3 vs the obvious alternative

The cross-vendor matchup is o3 vs Claude Opus: Opus 4.8 at $5/$25 looks 2.5x to 3x pricier on paper, but Opus does not bill thousands of hidden reasoning tokens per reply, so real-world costs land much closer than the rate card suggests. DeepSeek R1 ($3/$7 via Together) is the budget reasoning alternative, with the same caveat that chain-of-thought output inflates the bill. Measure billed tokens on your actual workload before declaring a winner.

When to use o3

Multi-step math and logic that GPT-5 / Claude Sonnet get wrong.
Code generation on algorithmic problems.
Single-shot answers to hard questions where you're willing to wait 10-30 seconds.
Eval-set work where you're testing the upper bound of model capability.

When not to use it:

Real-time chat, latency is too high.
Routine classification or extraction, overkill, and reasoning tokens balloon the bill.
Workloads where you've already validated GPT-5 or Claude.

Common questions

Should I use o3 or o3-pro?

o3 ($2/$8) covers most reasoning workloads. o3-pro ($20/$80) is positioned for the hardest problems where 10× spend is justified by single-shot accuracy. Don't reach for o3-pro before measuring o3 falling short.

How does o3 compare to o4-mini?

o4-mini ($1.10/$4.40) is roughly half the price of o3 with most of the reasoning capability. Try o4-mini first for cost-sensitive reasoning workloads, it's often the right choice.

Does o3 work with prompt caching?

Yes, cached input is $0.50/M (25% of standard rate). Useful for agent loops with stable system prompts.

Compare o3 to other models

GPT-5.5 (OpenAI, $5.00/$30.00)
GPT-5.5 Pro (OpenAI, $30.00/$180.00)
GPT-5.4 (OpenAI, $2.50/$15.00)
GPT-5.4 Mini (OpenAI, $0.75/$4.50)
GPT-5.4 Nano (OpenAI, $0.20/$1.25)
GPT-5.4 Pro (OpenAI, $30.00/$180.00)
GPT-5.3 (OpenAI, $1.75/$14.00)
GPT-5.2 (OpenAI, $1.75/$14.00)
GPT-5.2 Pro (OpenAI, $21.00/$168.00)
GPT-5.1 (OpenAI, $1.25/$10.00)
GPT-5 (OpenAI, $1.25/$10.00)
GPT-5 Mini (OpenAI, $0.25/$2.00)
GPT-5 Nano (OpenAI, $0.05/$0.40)
GPT-5 Pro (OpenAI, $15.00/$120.00)
GPT-4.1 (OpenAI, $2.00/$8.00)
GPT-4.1 Mini (OpenAI, $0.40/$1.60)
GPT-4.1 Nano (OpenAI, $0.10/$0.40)
o3-mini (OpenAI, $1.10/$4.40)
o3-pro (OpenAI, $20.00/$80.00)
o4-mini (OpenAI, $1.10/$4.40)
GPT-4o (OpenAI, $2.50/$10.00)
GPT-4o mini (OpenAI, $0.15/$0.60)
GPT-4 Turbo (OpenAI, $10.00/$30.00)
Gemini 3.1 Pro (Google, $2.00/$12.00)
Mistral Large (Mistral, $2.00/$6.00)
Qwen3 Coder 480B (Alibaba, $2.00/$2.00)

Detailed comparisons

o3 vs Claude Opus 4.8