Grasp AI Modern Conversion Website

The Challenge

Most enterprise teams I talk to have the same story about open models. They love the flexibility. They love the control. They love not being locked into a single provider's pricing and roadmap. And then they try to run open models in production and hit a wall.

The wall isn't the model itself. Open models like DeepSeek V3.2 and Kimi K2.5 are genuinely capable. The wall is everything around the model: the inference infrastructure, the deployment pipelines, the governance layer, the observability, the cost management. Teams end up stitching together three or four different tools and vendors just to get an open model serving traffic reliably.

That fragmentation has a real cost. Every additional tool is another contract, another API surface, another thing to monitor. It slows teams down when they should be iterating. And it makes the "should we just use GPT-5.4 on Azure?" conversation a lot more tempting — not because the proprietary model is better for the job, but because the operational overhead of open models is higher than it should be.

What's Changed

Microsoft has announced the public preview of Fireworks AI on Microsoft Foundry, and this is a significant step toward solving the operational gap.

Fireworks AI runs inference at serious scale — over 13 trillion tokens processed daily, around 180,000 requests per second, and over 1,000 tokens per second on large models. They're consistently at the top of the Artificial Analysis benchmarks for inference speed. That performance engine is now available directly through Microsoft Foundry, with Azure-grade governance wrapped around it.

Here's what's available at launch:

DeepSeek V3.2 — the open reasoning model that's been turning heads
OpenAI gpt-oss-120b — OpenAI's open-source 120B parameter model
Kimi K2.5 — Moonshot AI's strong multilingual model
MiniMax M2.5 — new to Foundry, a fresh addition to the serverless catalogue

The deployment options are practical. You can use serverless pay-per-token through Data Zone Standard for experimentation and variable workloads, or provisioned throughput units (PTUs) for steady-state production traffic where you need predictable latency and cost.

But the real value isn't just "fast open model inference on Azure." It's that Fireworks AI inference now lives inside the same Foundry control plane as your proprietary models, your agents, your evaluation pipelines, and your governance policies. You're not managing a separate stack for open models anymore. One workspace. One set of controls. One bill.

The bring-your-own-weights (BYOW) capability is worth highlighting. If you've fine-tuned a model elsewhere — quantised it, trained it on your domain data — you can upload those weights and serve them through Fireworks' inference stack on Foundry. No need to rebuild your serving infrastructure or migrate to a different framework. That's a practical acknowledgement of how enterprise teams actually work: they don't start from scratch, they bring existing assets.

Getting Started

The setup is straightforward:

Go to Microsoft Foundry and browse the model catalogue
Filter by the Fireworks AI collection
Select your model and review the model card
Choose your deployment type — serverless or PTU — and deploy
Use the standard Azure OpenAI-compatible API endpoint

For custom weight models, Microsoft has published a guide to uploading and registering your own weights for inference with Fireworks on Foundry.

If you want to see the integration in action, there's a Model Mondays livestream on March 23 covering Fireworks on Foundry in detail.

Key consideration: pricing. Serverless pay-per-token is ideal for experimentation, but run the numbers on PTUs before committing to production workloads. The economics of open model inference can be significantly better than proprietary models at scale, but only if you right-size your provisioning.

What This Means

This integration represents a broader shift in how Microsoft positions Foundry. It's not just a place to consume Microsoft and OpenAI models anymore. It's becoming a genuine model marketplace with enterprise operations built in — a control plane where you pick the best model for each workload, regardless of who built it, and manage them all through the same tooling.

For enterprise teams evaluating their AI model strategy, this changes the calculus. The operational argument against open models — "it's too much infrastructure overhead" — gets considerably weaker when high-performance inference, deployment management, and governance all come through the platform you're already using.

The competitive pressure this puts on other inference providers is real. If you can get Fireworks-grade inference speed inside the same Azure environment where your data and applications already live, the incentive to run a separate inference platform drops significantly.

Open models are moving from "technically superior but operationally painful" to "technically superior and operationally straightforward." That's the transition that drives real adoption.

Leon Godwin, Principal Cloud Evangelist at Cloud Direct

Why Fireworks AI on Microsoft Foundry Changes the Open Model Equation for Enterprise Teams

The Challenge

What's Changed

Getting Started

What This Means