GPT-5.4 on Azure: The Model That Actually Finishes What It Starts
Every enterprise team building AI agents has hit the same wall. The prototype works brilliantly in a demo. Then you deploy it to production, hand it a 15-step workflow, and somewhere around step 9 it loses the plot. Instructions get fuzzy. Tools don't fire. Context drifts. You end up babysitting the thing you built to save you time.
GPT-5.4 in Microsoft Foundry is OpenAI's direct response to that problem. And for once, the pitch isn't about being smarter — it's about being more reliable.
What GPT-5.4 actually brings
The headline capabilities read like a list of things that broke in production with earlier models:
Consistent reasoning over time. The model maintains intent across multi-turn, multi-step interactions. If you give it a 20-step workflow, it remembers what step 1 was about when it reaches step 20. This sounds basic. In practice, it's been the single biggest failure mode for production agents.
Enhanced instruction alignment. Less prompt engineering to get the behaviour you want. The model follows instructions more closely, which means fewer iterations of "no, I meant the other thing" and less overhead on prompt tuning.
Integrated computer use capabilities. This is the structural shift. GPT-5.4 can orchestrate tools, access files, extract data, execute guarded code, and hand off between agents — all natively. Not through hacky function-calling workarounds, but as a core capability. Browser automation and MCP (Model Context Protocol) integrations are coming to Foundry Agent Service shortly after launch.
Dependable tool invocation. When the model decides to call a tool, it actually calls it correctly. Fewer hallucinated parameters, fewer missed invocations, fewer "I would have called the API but decided to summarise instead" moments.
The latency improvements matter too. For real-time workflows — customer support agents, developer copilots, operational dashboards — response time is a feature, not a nice-to-have.
GPT-5.4 Pro: when depth beats speed
There's a premium variant, GPT-5.4 Pro, designed for scenarios where you'd rather wait longer for a better answer. It does multi-path reasoning evaluation — exploring alternative approaches before committing to a response. Think scientific research, complex financial analysis, or any problem where the first plausible answer isn't necessarily the right one.
The trade-off is explicit: Pro costs 12x more on input tokens ($30 vs $2.50 per million) and 12x more on output ($180 vs $15 per million). You're paying for analytical rigour, not throughput. For most production workloads, standard GPT-5.4 is the right choice. Pro is for the cases where being thorough matters more than being fast.
The Foundry context matters
Dropping GPT-5.4 into Foundry isn't just about model access. The Model Router — powered by a fine-tuned SLM — evaluates each prompt and routes it to the optimal GPT-5 family variant based on complexity, performance needs, and cost. Microsoft claims up to 60% savings on inferencing costs with no fidelity loss.
That's significant for enterprises running mixed workloads. Simple Q&A goes to GPT-5 nano (ultra-low latency). Conversational tasks go to GPT-5 chat. Complex agentic work goes to GPT-5.4. You use one endpoint and let the router decide. The cost optimisation happens automatically.
The governance layer is equally important. Azure AI Content Safety screens every prompt and completion. The AI Red Teaming Agent runs alignment, bias, and security tests. Metrics stream into Azure Monitor and Application Insights. Security signals feed into Defender for Cloud. Audit trails go to Purview. For regulated industries, this is the difference between "we're experimenting with AI" and "we have a production AI system with appropriate controls."
Getting started
GPT-5.4 is available now in Standard Global and Standard Data Zone (US) deployments through Microsoft Foundry. Head to the Foundry Model Catalog to deploy.
Pricing is $2.50 per million input tokens, $0.25 per million cached input tokens, and $15.00 per million output tokens. Pro runs at $30.00 and $180.00 respectively. Additional deployment regions are coming but no specific timeline has been shared.
One caveat worth flagging: computer use capabilities aren't available at launch. They're coming "shortly after," which in Microsoft release cadence could mean days or weeks. If browser automation is your primary use case, check the Foundry changelog before committing.
SAP, Relativity, and Hebbia are already building on it. SAP is integrating through their AI Foundation generative AI hub. Relativity is applying it to legal data intelligence. Hebbia is using it for structured financial analysis across thousands of documents.
What this means for your AI strategy
The GPT-5 family in Foundry now spans five models — from nano (fast, cheap, simple) to 5.4 Pro (slow, expensive, thorough). With the Model Router handling selection automatically, the deployment model has shifted from "pick a model" to "describe your workload."
That's a meaningful change for IT leaders. You stop evaluating individual models and start evaluating platforms. The question isn't "should we use GPT-5.4 or Claude Opus 4.6?" — both are available in Foundry. The question is whether your governance, monitoring, and deployment infrastructure can support production AI agents at scale.
GPT-5.4 doesn't solve that organisational challenge. But it removes the excuse that the models aren't ready. They are. The question now is whether you are.
Leon Godwin is Principal Cloud Evangelist at Cloud Direct, helping organisations navigate cloud strategy with clarity and technical honesty.