Live tool

Azure OpenAI Token & PTU Cost Calculator

Estimate Azure OpenAI monthly cost from input/output token counts and request volume, compare every model side by side, and see the pay-as-you-go vs provisioned throughput (PTU) break-even. Runs entirely in your browser; nothing is sent to the server.

Open tool
Pricing note: Token and PTU rates are illustrative values for East US public Azure, last verified 1 Jun 2026. Verify against the Azure Pricing Calculator before committing budgets. PTU break-even compares pay-as-you-go token spend against the cost of the minimum provisioned deployment — it does not model throughput sizing (PTUs → tokens/min).

Workload inputs

Per-request token counts and monthly volume. Results update live.

Prompt + system + retrieved context.
Completion length. Output is usually the dominant cost.

Estimate

Monthly cost and the pay-as-you-go vs PTU recommendation.

Cost per request $0 input + output tokens
Pay-as-you-go / mo $0 token spend
PTU floor / mo $0 min provisioned deployment
Break-even 0 requests / month

All models at this workload

Same token counts and volume across every catalog model, cheapest pay-as-you-go first.

Model Per request PaYG / mo PTU floor / mo Cheaper option

How to read this

PaYG vs PTU, in practice.

Pay-as-you-go You pay per token. Cost scales linearly with volume — great until sustained traffic makes it predictable enough to commit.
Provisioned Throughput (PTU) You reserve dedicated capacity at a fixed monthly price. Below the minimum deployment it is poor value; above the break-even it wins and gives predictable latency.
Break-even The monthly request count where PaYG token spend equals the PTU floor. Above it, the tool recommends PTU. It is a cost line only — real PTU sizing depends on your tokens-per-minute throughput.

Costs this does not include yet

Why your real bill runs higher.

1. Retry & overhead tokens Failed calls, re-prompts, and function-calling round-trips add tokens the napkin math misses.
2. Fine-tuning hosting A fine-tuned model carries a separate hourly hosting charge on top of token cost — often the biggest surprise line.
3. Prompt caching & batch discounts These reduce cost; this calculator deliberately ignores them, so it errs on the conservative (higher) side.
4. Pinning output too low Output tokens dominate. Underestimating completion length is the most common reason an estimate undershoots reality.