.4 watt-hours. That’s what a single GPT-4o query costs in electricity, according to a benchmarking study published on arXiv last year that tested 30 commercial models. Scale that against ChatGPT’s reported 2.5 billion daily prompts and the math gets ugly fast. The researchers modeled a conservative scenario (700 million daily queries, well below actual volume) and found annual energy consumption on par with 35,000 American homes. The freshwater evaporated to cool those servers? Enough to supply drinking water for 1.2 million people.
One product. One company. Conservative estimate.
I’ve spent the last two months going back and forth on whether to write this piece. The argument that massive general-purpose AI is wasteful isn’t new. Karen Hao made it powerfully in Empire of AI, tracing how the pursuit of scale has driven a resource grab that mirrors extractive industries we thought we’d learned from. Her reporting genuinely unsettled me. But what kept nagging at me wasn’t the environmental story alone. It was that the environmental critique and the effectiveness debate never seem to occupy the same room.
Marketers hear one conversation about the planet. They hear a separate conversation about which AI tools perform best. Those two conversations almost never merge, and so the people making actual purchasing decisions on AI tools every quarter miss the complete argument.
The complete argument is this: smaller, purpose-built AI isn’t just the more responsible choice. The math works better too.
The Performance Myth We Keep Buying
I used to assume bigger models delivered better results. That assumption was convenient for the vendors selling them, and I’ll admit I bought it longer than I should have.
IBM Fellow Kush Varshney changed my thinking when he said publicly earlier this year: “You can get a small language model performing at the same level, or even better, than much larger models.” IBM’s Granite 3.0 series (models at 2 and 8 billion parameters) demonstrated performance rivaling leading open models of comparable size. Fine-tune one on your enterprise data and you get task-specific output that competes with models 10x larger at a fraction of the cost. The math works.
Then there’s the Virginia Tech research that landed this month from their Sanghani Center. Their team showed that purpose-built small language models can avoid data center-scale computing entirely, running on institutional servers or edge devices instead. Here’s the part that stuck with me: a fine-tuned SLM outperformed large GPT models for emergency department triage at Children’s National Hospital. Better clinical outcomes. Radically less infrastructure. I keep coming back to that example because it collapses the whole “bigger is better” argument into a single data point.
The cost side is even more lopsided. Serving a 7-billion parameter model runs 10 to 30 times cheaper than a 70 to 175-billion parameter LLM. For a marketing team managing seven-figure annual tool budgets, that’s not a rounding error. That’s headcount. That’s the difference between funding an optimization hire and funding another enterprise license nobody fully uses.
What This Means If You Run a Marketing Org
I’ve seen this movie before. Twenty years of building consumer acquisition programs, and the pattern repeating in AI adoption is one I watched play out with marketing automation platforms in the early 2010s and with CDPs five years ago: teams over-invest in general-purpose platforms because the brand name feels safe. A narrower, purpose-built tool would outperform at a fraction of the cost, but nobody gets fired for buying the big vendor.
Think about what most marketing teams actually need AI to do. Segment audiences. Generate copy variants. Predict churn. Score leads. Optimize send times. None of that requires a model trained on the entire internet. It requires a model trained well on your data, your customers, your vertical.
ActiveCampaign’s new agent-to-user AI, announced at their Spring Innovation Keynote on April 8, is a useful example. It’s not trying to be a general intelligence. It watches campaign performance signals and generates specific recommendations. Purpose-built system. Defined problem. Contrast that with the enterprise pitch from the major LLM providers, which amounts to: give us your data, we’ll figure out something useful to do with it.
And if you’re in healthcare, financial services, or any regulated vertical, that pitch should stop you cold.
The Compliance Problem Nobody Wants to Talk About
I spent years building consumer acquisition at a Fortune 50 health system, so this one is personal. Every piece of customer data we touched had HIPAA implications. Every vendor integration required a BAA and a security review that could take months.
The idea of piping patient-adjacent behavioral data into a general-purpose LLM hosted on shared infrastructure, where your prompts might train the next version of someone else’s model, isn’t just risky. In many cases, it’s a compliance violation waiting for an auditor to notice. Purpose-built AI running on dedicated infrastructure, trained on a defined dataset with clear data governance, isn’t a nice-to-have in regulated industries. It’s the only architecture that survives an audit. I could be wrong that every health system sees it this way yet. But give it 18 months.
The martech integration numbers support the broader point. Research published this month found that 90% of marketing organizations claim they’re using AI agents, while only 6.3% have achieved actual full-stack integration. That 84-point gap isn’t a technology failure. It’s what happens when you buy a tool built to do everything and try to retrofit it into a workflow that needs it to do three things. The mismatch is structural. This is a seven-figure infrastructure decision disguised as a software subscription.
The Cost Nobody Puts in the Pitch Deck
When a vendor demos their AI capabilities, they don’t mention that U.S. data centers consumed 66 billion liters of water directly in 2023, and the IEA projects that number could reach 1.2 trillion liters annually by 2030 as AI workloads scale. They skip over the University of California, Riverside study estimating AI-related water withdrawals could hit 6.6 billion cubic meters by 2027. They definitely don’t mention that Google’s own water consumption jumped 20% in a single year as it scaled AI infrastructure.
I’m not suggesting every marketing leader needs to become an environmentalist. I am suggesting that resource consumption is becoming a real business risk that most tool buyers are sleepwalking past. Data center energy costs are rising. Municipalities are pushing back on water allocations. Regulation is coming, slowly, and the brands running leaner AI infrastructure will have a cost advantage when those pressures arrive.
Your customers are paying attention, too. Aerie expanded its “100% Real” pledge to ban AI-generated people and bodies from its marketing, then launched a Pamela Anderson campaign built around the tagline “AI could never.” Instagram engagement surged 75%. I’m genuinely uncertain whether “no AI” becomes a mainstream consumer expectation or stays a niche positioning play. But the direction of consumer sentiment is clear enough that I’d want my brand on the right side of it.
The environmental footprint of your martech stack is going to become a brand question, not just an IT question. The consumer doesn’t care about your vendor’s parameter count. They’re starting to care about what it costs the planet.
Pick the Scalpel
Every pitch deck from the major AI vendors carries the same implicit message: general-purpose equals capable, more parameters equals safer bet, scale equals quality.
Twenty years of building marketing systems taught me the opposite. The best tools are the ones scoped precisely to the problem. Faster to implement, cheaper to run, easier to integrate, more reliable in production. They also happen to consume a fraction of the energy and water.
That 6.3% of marketing teams who actually integrated AI into their stack? They didn’t get there by buying the biggest model available. They got there by matching specific capabilities to specific workflows. Boring. Effective.
If you’re evaluating AI tools this quarter, ask your vendor what model powers the feature, what the energy cost per inference looks like, and why this task needs a general-purpose model instead of a fine-tuned one.
Run the experiment. The silence will tell you everything.