Grasp AI Modern Conversion Website

The Challenge

Every enterprise team building with large language models has hit the same wall. You load a codebase, a contract archive, or a month's worth of agent session logs — and the model runs out of context. So you chunk. You summarise. You build elaborate retrieval pipelines to feed the model pieces of what it needs, hoping it can reconstruct the full picture from fragments.

That chunking has a cost. Not just in engineering time, but in quality. Context fragmentation means the model never sees the whole picture at once. Cross-document reasoning breaks down. Multi-turn agent workflows lose the thread of earlier decisions. And teams spend more time managing context windows than solving actual problems.

Until now, longer context windows existed — but they came with premium pricing multipliers that made them impractical for production workloads. The economics pushed teams back toward chunking whether they wanted to or not.

What's Changed

Anthropic has taken Claude Opus 4.6 and Sonnet 4.6 to general availability with a full 1 million token context window at standard pricing. No beta headers. No long-context surcharge. A 900,000-token request costs the same per-token rate as a 9,000-token one.

Here are the numbers that matter:

Opus 4.6: $5 input / $25 output per million tokens — across the full 1M window
Sonnet 4.6: $3 input / $15 output per million tokens — same flat rate
600 images or PDF pages per request (up from 100) — a 6x expansion in media processing
128K maximum output on Sonnet 4.6
No code changes required — if you were using the beta header, it's now ignored

Both models are available today on Microsoft Foundry, the Claude Platform, and Google Cloud Vertex AI. For Azure teams, this means deploying within Foundry's existing enterprise governance, compliance, and operational tooling — no separate infrastructure needed.

The recall quality holds up, too. Opus 4.6 scores 78.3% on MRCR v2, the highest among frontier models at 1M context length. That's the difference between a model that can theoretically accept a million tokens and one that can actually reason across them.

Getting Started

If you're already using Claude models in Microsoft Foundry, the upgrade path is straightforward. Requests over 200K tokens now work automatically — no opt-in, no configuration changes.

For teams evaluating this for the first time, start here:

Browse the model catalog: Claude Opus 4.6 on Azure AI or Claude Sonnet 4.6 on Microsoft Foundry
Deploy via Foundry: standard serverless deployment, same as any other model in the catalog
Test with real workloads: load an actual codebase, a full contract set, or an extended agent conversation and compare results against your current chunked approach

Sonnet 4.6 also introduces adaptive thinking and effort parameters. Rather than always invoking extended reasoning, the model decides when deeper thinking is actually needed. You can tune the effort level to control quality-latency-cost trade-offs for your specific use case.

For development teams using Claude Code, 1M context is now included for Max, Team, and Enterprise tiers. This means fewer compaction events — Anthropic reports a 15% decrease in context compactions — so long debugging sessions and codebase explorations maintain continuity.

What This Means

The pricing move is the real story here. Long context has existed in various forms for over a year, but premium pricing pushed most enterprise teams toward RAG and chunking strategies. Removing the surcharge changes the calculus for a specific set of workloads: legal document review, full-codebase reasoning, extended agentic workflows, and any scenario where context fragmentation was degrading output quality.

That said, this isn't a reason to abandon retrieval-augmented generation entirely. RAG remains more cost-efficient when you need targeted information from massive corpora. The 1M window is a scalpel for workloads that genuinely need the full picture — not a replacement for every information retrieval pattern.

In my experience working with enterprise customers, the teams that will benefit most are those already struggling with context management: chunking contracts across multiple calls, losing agent state between turns, or building complex orchestration just to keep the model informed. For those teams, this is a direct reduction in engineering complexity and an improvement in output quality.

The competitive implications are worth watching. OpenAI's GPT-5.4 sits at 200K context with no announced expansion at standard pricing. Gemini 2.5 Pro offers 1M context but with a different pricing structure. Anthropic is first to remove the premium entirely — and making it available through Microsoft Foundry means Azure-native teams get access without leaving their existing governance and compliance posture.

The question for most organisations isn't whether 1M context is impressive. It's whether their workloads actually need it. Start with the use cases where chunking is visibly hurting quality, and test from there.

Leon Godwin, Principal Cloud Evangelist at Cloud Direct

1M Context at Standard Pricing: What Claude on Azure Foundry Means for Enterprise AI

The Challenge

What's Changed

Getting Started

What This Means