#tHow Many Tokens?

← All models

Claude Sonnet 4.6: token counter & pricing

Anthropic · exact (uses official tokenizer) · pricing as of 2026-05-31.

Provider
Anthropic
API model ID
claude-sonnet-4-6
Context window
200,000 tokens
Input price
$3.00 per 1M tokens
Output price
$15.00 per 1M tokens
Tokenizer accuracy
exact (uses official tokenizer)
Pricing as of
2026-05-31

Open the counter to count tokens for Claude Sonnet 4.6 in real time.

What is Claude Sonnet 4.6?

Claude Sonnet 4.6 is Anthropic's mid-tier model, the workhorse most production Claude workloads should default to. Strong reasoning, strong code, strong writing, at 5× lower input cost than Opus and 5× lower output cost than Opus.

How tokens are counted here

For Sonnet we call Anthropic's official /v1/messages/count_tokens endpoint through a serverless proxy and show you the raw result. That makes the count exact: it is the same number Anthropic's billing system uses, not a client-side approximation.

Privacy-wise, the proxy is a pass-through. Your prompt goes to Anthropic's tokenization endpoint and nowhere else; per Anthropic's count_tokens policy it is not logged, not stored, and not used for training.

One useful side effect of the shared Claude tokenizer: a Sonnet 4.6 count is also a valid Haiku 4.5 count for the same text, so you can price out a Sonnet-to-Haiku downgrade from a single measurement. The exception is Opus 4.8, whose newer tokenizer can run up to 35% higher on identical input.

What Claude Sonnet costs in production

A concrete case: a customer-support assistant that drafts replies for 2,000 tickets a day, with each ticket sending about 1,500 tokens of conversation history and knowledge-base context and producing a 300-token draft.

The same volume on GPT-4o ($2.50/$10) bills $225 + $180 = $405/month, roughly 25% less. On Claude Opus 4.8 ($5/$25) it bills $450 + $450 = $900/month for a quality difference most support workloads will not notice. Sonnet's spot in the middle is the point: cheap enough to run at ticket volume, capable enough that you are not constantly escalating.

Prompt caching changes this math further if your knowledge-base context repeats across tickets; cached input runs about 10% of the standard rate.

Migrating from Sonnet 4.5

Swap the model string from claude-sonnet-4-5 to claude-sonnet-4-6 and the bill stays put: pricing held at $3/$15 across the generation, and Sonnet 4.6 keeps the shared Claude tokenizer, so token counts for existing prompts do not move. The 200,000-token context window is also unchanged. In practice the migration risk is behavioral, not financial: 4.6 follows instructions more tightly, which can surface places where your prompts relied on the model ignoring a sloppy constraint. Re-run your eval set before flipping production traffic, but skip the cost re-estimate, there is nothing to re-estimate.

Claude Sonnet vs the obvious alternative

The closest competitor is GPT-4o, compared in depth at GPT-4o vs Claude Sonnet: $2.50/$10 versus Sonnet's $3/$15, so GPT-4o wins raw price while Sonnet tends to win on instruction-following and long-context behavior. If you are tempted to go cheaper still, Claude Sonnet vs GPT-4o mini covers the 20x price gap ($0.15/$0.60) and what you give up to get it.

When to use Sonnet over Opus or Haiku

If your workload is high-volume classification, extraction, or short Q&A, Claude Haiku is 4× cheaper with quality differences invisible to most users.

Common questions

How does Sonnet compare to GPT-4o on price?

Sonnet: $3/$15 per million. GPT-4o: $2.50/$10. GPT-4o is ~17% cheaper on input, 33% cheaper on output. For input-heavy workloads with short replies, GPT-4o usually wins on raw cost. Sonnet often wins on instruction-following nuance, measure with your prompts.

Does the 200,000-token context window cost more?

No. Input is billed per token regardless of context-window position. A 100k-token prompt costs the same per token as a 1k-token prompt, the total just scales with the number of tokens you send.

Is prompt caching available on Sonnet?

Yes. Anthropic's prompt caching reduces cost on repeated long-context prompts (e.g., the same RAG document across many queries). Cached input tokens cost ~10% of the standard rate. Not reflected in the calculator above; factor it in manually if your workload qualifies.

Compare Claude Sonnet 4.6 to other models

Detailed comparisons