OpenAI missed its own user and revenue targets this week. The story matters, but not for the reason most coverage is pointing to. It's the supplier-side confirmation of what's happening on the buyer side too, capability is racing ahead, ROI is stalling, and the gap between them is widening.
Take capability first. What the models can do in a lab keeps climbing. Stanford's 2026 AI Index, published earlier this month, shows software engineering benchmarks jumping from roughly 60% to nearly 100% in a single year, with the report's authors noting that capability is now improving faster than benchmarks can measure. One quoted researcher said he's "stunned that this technology continues to improve." Fair enough. It does.
Take enterprise AI value next. What the models actually deliver inside a company is flattening. MIT Sloan, in its 2026 action items for AI decision-makers, called this year a "level-set year." The hype is cooling. The pilots aren't graduating. And the ROI cases marketing leaders are being asked to defend are getting harder to defend, not easier.
That OpenAI story has more in it than the headline suggests. The same WSJ piece reports that the CFO has raised concerns about whether revenue can keep pace with the company's data-center commitments, including the goal of a billion weekly active ChatGPT users by year-end that quietly slipped. Oracle, with a $300 billion compute deal at stake, dropped more than 3% on the news. OpenAI pushed back on the framing, and the specific numbers deserve some skepticism. But the directional signal is harder to dismiss: when the lab that defines the capability frontier is missing its own deployment-side numbers, the gap isn't theoretical anymore.
For B2C marketers, that gap is the actual story. Not what GPT-7 can do. What your team can ship reliably, at consumer scale, in production, this year.
The capability curve is real. It's also misleading.
I want to be careful not to wave away the model advances. They're real, and Stanford's index makes that case clearly.
The misleading part is that capability gains don't translate one-to-one into operational gains. Capability is what's possible in a sandbox. Operational value is what survives integration with your CRM, your analytics stack, your compliance review, your brand voice, and the long tail of things consumers actually do that your data doesn't anticipate.
I've seen this movie before. The CDP wave promised a unified customer view. What enterprises got was a unified data warehouse and a list of integration projects. The marketing automation wave promised one-to-one personalization at scale. What enterprises got was better email triggering and a long argument about identity resolution. Capability arrived. Operational maturity took another five years.
AI is on the same arc, just compressed. The capability is here. The operational reality is somewhere behind it, walking.
Where the friction actually lives
Here's where the data gets concrete. MIT Sloan's 2026 outlook reports that only 38% of large enterprises have appointed a chief AI officer. The number itself isn't quite the point, and I'd want to see the breakdown of what "chief AI officer" means in practice, since at some companies that's a senior strategy hire, at others it's a relabeled head of analytics, at others it's a marketing person with the title. The data conflates those.
What the figure does signal is that more than three years into the generative AI cycle, most enterprises still don't have a clear owner. No owner means no roadmap. No roadmap means projects get justified one ROI case at a time, by whoever happens to champion them.
That structural problem shows up in marketing more than most functions. Marketing AI projects sprawl across content, paid media, CRM, analytics, customer service. The cross-functional surface area is massive. Without an owner, every team builds its own thing. The result is a stack with seven AI tools that don't talk to each other and a CMO who can't tell the board which ones are actually working.
The other piece of the deployment story is what researchers are calling jagged intelligence: the unevenness of where AI works versus where it fails. Stanford's index notes that productivity gains are concentrated in narrow domains, customer service, software development, basic content drafting. Tasks involving real-world judgment, physical action, or context-sensitive consumer interaction lag badly.
Robots succeed on only 12% of household tasks, per the same index. I'm honestly not sure how directly that translates to marketing AI; the failure modes are very different. But the underlying point about uneven performance holds in any consumer-facing context, and that's where most B2C marketing decisions live.
Why B2C marketing feels this gap differently
B2B marketers can usually absorb the gap. Their volumes are lower, their reliability tolerance is higher, and a 95%-accurate AI tool that occasionally misfires on a sales email is a survivable problem.
Consumer marketing doesn't have that cushion. When you're running acquisition for tens of millions of consumers, a 5% error rate isn't a rounding error. It's a class action. It's a brand crisis. It's a regulatory inquiry, especially in regulated categories where the consumer base trusts the institution to get it right every time.
This is the part vendor demos never engage with. Demo accuracy is not production accuracy. The gap between "this looks great in the room" and "this works at member-level scale on Tuesday morning" is the gap that quietly kills most marketing AI deployments.
A martech stack is plumbing. Nobody notices it when it works. Everyone notices it when a generative feature shipped to one segment writes something off-brand, or when a personalization model trained on stale data starts recommending the wrong product to the wrong consumer at the wrong moment. The reliability ceiling on a B2C marketing AI tool is much higher than the reliability ceiling on most other enterprise AI use cases. And the frontier capability gains we keep reading about aren't pushing that ceiling much higher, because reliability isn't really the thing the frontier is improving on.
What smart teams are doing
I was at Adobe Summit last week, and the most useful thing wasn't the product roadmap. It was how consistent the customer stories were on what's actually working and what isn't. Across Ulta, Intuit, IBM, Cleveland Clinic, and Dick's Sporting Goods, the customers landed on the same answer, independently, and it wasn't about the tech. It was that the operating model has to come first. Before the data work. Before the agent strategy. Before any of it.
These marketing teams are quietly recalibrating. Not retreating from AI. Recalibrating. The ones that pulled their projects forward in 2024 and 2025 spent most of last year cleaning up disappointments. The ones moving carefully now are doing a few things differently.
They're starting with the operating model, not with the data. This sounds backwards. Every vendor pitch deck I've sat through opens with some version of "your data is the foundation," and that's not wrong, it's just incomplete. Without a cross-functional structure that actually owns the AI work end-to-end (marketing, IT, data, analytics, and in regulated categories, compliance and clinical at the same table from the start), you can't prioritize fixing the data in the first place.
You can't secure budget against competing priorities. You can't hold anyone accountable when the work slips. Marketing builds one foundation. IT builds a different one. A year later you have two foundations and an integration project.
None of the customers I heard at Summit framed this as a tech problem. All of them framed it as an org problem they had to solve before the tech work could even start.
After the operating model, they're decoupling capability from deployment. IBM's framing was the cleanest version of this I've heard: "AI without IA is vanity." Information architecture. Pile capability on broken IA and you get nothing back.
Worth noting where some vendors are starting to land. Adobe's CX stack, models trained on the customer's own data, embedded inside purpose-built tools like Journey Optimizer rather than bolted on as a generic LLM, is the kind of architectural answer that takes the IA constraint seriously. It's harder to demo than a frontier-model showcase. It's also more likely to actually work in production.
But the order matters: you only earn the right to fix the IA after the operating model is in place. Most marketing AI deployments stall in the gap between those two steps, which is why model selection, the part vendors want to talk about, should be treated as a relatively cheap, frequently revisited decision and integration architecture should be treated as the expensive, slow-changing one.
They're narrowing the scope of what AI is allowed to touch in production. Customer service draft assistance, internal content drafting, analyst-level data summarization, fine. Outbound consumer-facing copy at scale, dynamic personalization touching the actual consumer experience, anything regulated, those are getting a much more conservative review. Not because the model can't do it, but because the operational risk is real and the upside is often smaller than expected.
Intuit's case study at Summit was the most useful version of this I saw. Their first attempt, a stitched-together generative AI setup meant to produce campaign creative, lost a head-to-head against their existing agency. They've since pivoted: the agency creates and leads the original concept, the work goes through normal brand and approval processes, and only then does AI come in. Variant testing, format adaptation, channel-specific optimization, faster iteration, less creative fatigue across the cycle. That version is delivering real ROI.
But look at what the success pattern actually is: AI as an iteration layer on top of human-approved creative, not generation from scratch. And look at the meta-point, this was among the strongest gen AI marketing success cases I saw across the entire conference. If the technology could meaningfully generate from-scratch creative for a brand at Intuit's scale, the industry would already have a headline case for it.
It doesn't. Not yet, anyway.
And they're separating the AI ROI question from the AI cost question. A lot of "AI ROI" math right now compares the cost of the tool to the labor it might displace, while ignoring the integration cost, the governance cost, the model-drift monitoring cost, and the reputational cost of getting it wrong. Load all of that in and the math doesn't work in most pitches I've seen. The wins that hold up under inspection aren't in headcount displacement. They're in workflow compression, taking a 90-minute task and making it a 20-minute task, without any reduction in headcount, because the team needed that capacity for higher-value work anyway.
That's a less exciting story than the one the vendors are telling. It's also the one that's actually showing up in P&Ls.
The honest answer
If you're running marketing in 2026, the question isn't whether AI capability is going to keep improving. It is. The question is whether your operational reality can keep up. For most teams I've talked to, the bottleneck isn't model access. It's everything around the model, and in a specific order: the operating model first, then data quality and integration, then governance and brand controls, then measurement. Skip the first one and the rest of the list never gets prioritized.
That's an unsexy answer. It's the one I think is true.
The vendor pitch decks are going to keep showing benchmark charts going up and to the right. The AI Index will keep publishing capability gains. None of that is wrong. It just isn't the line you should be staring at.
Stare at the other one. The deployment line. The actual ROI on what you've already shipped. The number of AI projects that have moved from pilot to production in the last twelve months at your company. That's the line that tells you whether the gap is closing or widening.
For most marketing teams, it's still widening.