LLM model

GPT-4o

Multimodal workhorse with text, vision, and audio.

Open the selection wizard All models
Illustrative specs. Context window, modalities, output cost, and EU availability for GPT-4o are representative, last verified 1 Jun 2026. Verify against the provider before committing.

GPT-4o at a glance

The audio modality is its differentiator in this catalog.

Hosting Foundry Foundry (Azure)
Context window 128,000 tokens
Modalities text, vision, audio inputs
Output cost $10 / 1M tokens

Best for

  • Audio-in/audio-out workloads
  • Multimodal chat
  • Mixed-content RAG

Is it the right model?

Match it against your requirements.

The selection wizard ranks GPT-4o against every other model for your latency, context, modality, cost, and residency needs — and shows where it wins and where something else fits better.

Open the LLM Selection Wizard

Frequently asked questions

GPT-4o specs and cost.

What is GPT-4o best for? Multimodal workhorse with text, vision, and audio. It fits Audio-in/audio-out workloads, Multimodal chat, Mixed-content RAG.
What context window and modalities does GPT-4o support? GPT-4o handles up to 128,000 tokens of context and supports text, vision, audio input. It runs on Foundry (Azure).
How much does GPT-4o cost? Around $10 per 1M output tokens (illustrative, verified 1 Jun 2026). Output tokens usually dominate the bill — verify input and cached pricing against the provider before budgeting.