At the vendor level, SaaS companies must move beyond bundled token limits hidden inside seat tiers and expose token consumption as a first-class metric in their analytics and billing interfaces. Some are already moving in this direction — Anthropic's Claude, OpenAI's API, and Google's Gemini all have token-level billing in their developer tiers, even if their enterprise seat licenses still obscure this data. The next step is dashboards that give enterprise buyers real-time visibility into TPP across their organization, broken down by department, use case, and model tier.
At the enterprise level, technology leaders need to treat AI token budgets the same way they treat cloud compute budgets — with governance frameworks, cost allocation, and optimization disciplines. The average monthly AI spend per organization rose from $63,000 in 2024 to $85,500 in 2025, a 36% increase. Nearly half of companies now spend over $100,000 per month on AI. These are material expenditures that demand the same scrutiny as any other IT investment, and TPP is the instrument that makes that scrutiny possible.
At the infrastructure level, data center operators and hyperscalers need to publish TPW benchmarks alongside PUE scores, enabling buyers to assess not just the overhead efficiency of a facility but its actual AI productivity. This would create a market dynamic analogous to what happened with PUE itself after The Green Grid began publishing benchmarks in the late 2000s: operators competed to improve their scores, driving industry-wide efficiency gains. The same dynamic, applied to TPW, would drive competition on AI output efficiency — exactly the right optimization target for the energy-constrained decade ahead.
The Regulatory Catalyst
History suggests that voluntary industry adoption of new metrics is slow. PUE took years to become standard even after the Green Grid published it. The adoption of TPW and TPP at scale will likely require a combination of regulatory pressure and customer demand.
On the regulatory side, the EU's Energy Efficiency Directive is already creating the conditions. European regulators are mandating IT efficiency reporting for data centers and are actively developing a "work per energy" standard. Once that standard is adopted (likely by 2026 or 2027), it will become the de facto global benchmark for AI data center efficiency, much as European energy regulations have historically shaped global markets. A TPW-aligned metric that the EU mandates in Frankfurt and Amsterdam will quickly become standard in Northern Virginia and Singapore.
On the customer side, enterprise CFOs are increasingly aware that their AI bills are growing faster than their AI outcomes. The move from seat licenses to token budgets is a natural response to this pressure — and once large enterprises begin demanding TPP-based billing and benchmarking, vendors will have no choice but to provide it.
Conclusion: Measuring Intelligence, Not Just Electricity
The history of technology is partly a history of metrics — the numbers that shape investment decisions, competitive strategies, and industrial standards. Miles per gallon transformed the automotive industry. The cost per transistor drove Moore's Law. Uptime percentages defined the SLA economy. In each case, the right metric did not just measure progress — it created the conditions for progress by aligning incentives and making performance legible.
PUE was the right metric for the server farm era. It drove real efficiency gains: the industry improved average PUE from 2.5 in 2007 to 1.55 in 2022, a remarkable achievement. But those efficiency gains are now being swamped by demand growth, and the metric itself is blind to the dimension that matters most: how much intelligence are we producing for the energy we are consuming?
The data center of 2026 is an intelligence factory. The SaaS platform of 2026 is an intelligence delivery vehicle. The enterprise of 2026 is an intelligence consumer. None of these actors are well-served by a metric designed for a world where the question was simply "how cold is your building?"
Tokens Per Watt tells us how productive our intelligence factories are. Tokens Per Person tells us how efficiently our organizations are consuming the intelligence delivered to them. Together, they form the foundation of a measurement framework fit for the AI economy — one that connects electrons to outcomes, infrastructure to value, and investment to return.
The question is not whether these metrics will replace PUE. The question is whether the industry moves fast enough to adopt them before the energy, cost, and governance crises of AI scale force the issue. The organizations that get ahead of this — that build TPW into their procurement criteria and TPP into their AI governance frameworks today — will have a structural advantage in managing the intelligence economy of tomorrow.
PUE was a good metric for a different world. It is time to build the metrics for this one.
Next Steps: A Call to Action for Every Stakeholder
Frameworks without action are just theory. The AI Value Stack will only become the industry standard if specific actors move deliberately to adopt, instrument, and advocate for it. Here is what each stakeholder group should do next.
Data center operators and hyperscalers should begin publishing Tokens Per Watt benchmarks alongside PUE scores in their sustainability and transparency reports — even as an experimental metric. The investment in instrumentation is real but achievable, and the organizations that establish TPW baselines now will set the reference points that others are measured against. Waiting for a standard to emerge before measuring is how industries sleepwalk into regulatory mandates.
Cloud and AI platform vendors should expose token consumption as a first-class metric in their customer dashboards — broken down by user, department, use case, and model tier. This is a competitive differentiator, not a concession. The vendors that make AI costs legible will win the trust of enterprise procurement teams currently flying blind on their largest and fastest-growing IT line item.
SaaS vendors with AI-embedded products should begin the transition from pure seat licensing to hybrid models that include token-based components. This does not require abandoning seat pricing overnight — a base seat fee with a transparent token allowance and visible overage reporting is a commercially viable first step that aligns vendor costs with customer value.
Enterprise technology leaders — CIOs, CTOs, and AI platform owners — should treat AI token budgets with the same rigor they apply to cloud compute budgets. That means establishing a Tokens Per Person baseline for every AI-enabled tool in the estate, identifying the adoption gap between licensed seats and active token consumers, and building TPP targets into AI program governance for 2026 and beyond.
Enterprise procurement and finance teams should begin renegotiating AI contracts with token transparency as a non-negotiable requirement. Vendors who cannot or will not provide TPP-level reporting should be treated as a governance risk. The average organization spending $85,500 per month on AI in 2025 has every right to know what it is getting per token.
Edge AI manufacturers and OEM hardware vendors should publish tokens-per-watt specifications for inference accelerators as a standard part of product datasheets — alongside existing benchmarks for TOPS (tera-operations per second) and thermal design power. Tokens per millisecond should be added for real-time application classes. These specs will become purchasing criteria as enterprise edge deployments scale.
Standards bodies and regulators — particularly the EU's Energy Efficiency Directive working groups and organizations like The Green Grid and Uptime Institute — should accelerate alignment between the emerging "work per energy" standard and the Tokens Per Watt metric proposed here. A single, auditable, internationally recognized TPW standard would do for AI infrastructure efficiency what PUE did for data center overhead efficiency in the 2010s — but faster, and with far higher stakes.
Investors and ESG analysts should begin incorporating TPW and TPP into their assessment of AI infrastructure companies and enterprise AI programs. A company that cannot articulate its tokens-per-watt trajectory or its enterprise tokens-per-person adoption rate is not managing its AI efficiency — and in an energy-constrained decade, that is a material risk, not a technical footnote.
The Edge Computing Extension: From Core to Continuum
The AI Value Stack was introduced here as a data center framework. But the most powerful version of this argument is broader: it is a framework for the entire AI compute continuum — from hyperscale cloud to the edge device in your factory, your vehicle, or your hospital ward.
Edge AI is not a footnote. The edge AI market is projected to exceed $70 billion by 2030, and the majority of real-world AI deployment — in manufacturing, retail, healthcare, logistics, and transport — happens at or near the edge, not in a Northern Virginia hyperscale campus. Any efficiency framework that applies only to centralized data centers is already describing yesterday's infrastructure.
The good news is that the AI Value Stack transfers to the edge — but with important modifications at each layer that actually strengthen the overall argument.
PUE is even less useful at the edge than in the data center. A GPU inference card running inside a 5G base station, a retail kiosk, or an autonomous vehicle shares power infrastructure with non-IT systems in ways that make PUE essentially unmeasurable. There is no clean separation between "IT load" and "total facility load" when the facility is the device. If PUE was already inadequate for AI data centers, it is completely irrelevant at the edge — which makes the case for successor metrics even more urgent.
Tokens Per Watt becomes an operational ceiling, not just a benchmark. Data center GPU racks draw tens of kilowatts. Edge AI accelerators draw 10 to 25 watts. The absolute numbers are different, and device-class benchmarks are needed rather than facility-level ones — but the concept is more powerful, not less. At the edge, energy budgets are hard constraints: battery life, thermal envelopes, and remote-site grid capacity are not negotiable. Tokens per watt at the edge is not an efficiency aspiration — it is an engineering requirement.
Tokens Per Person maps naturally to Tokens Per Device. Most edge AI deployments are not SaaS seat licensed. They are embedded in hardware, sold per-device, or tied to specific physical assets. The right reframing is Tokens Per Device per watt-hour — a metric that directly captures the AI productivity of a constrained embedded system and creates a basis for comparing generations of edge hardware, optimizing model selection for deployment, and managing fleet-wide AI costs in industries like automotive, industrial automation, and smart infrastructure.
Latency becomes a co-equal efficiency dimension. Edge AI exists largely because round-trip latency to a centralized data center is unacceptable for real-time applications. This means the AI Value Stack needs a sixth dimension at the edge: a latency-adjusted efficiency score, or what might be called Tokens Per Millisecond — capturing not just how much intelligence a device produces per watt, but how quickly it delivers that intelligence when it matters. A model producing 80 tokens per watt at 200ms latency may deliver less real-world value than one at 40 tokens per watt and 20ms, depending on the application.
Outcomes Per Token is arguably more legible at the edge than anywhere else. Edge AI is typically deployed for specific, measurable tasks: defect detection rates on a production line, customer dwell time at a retail display, collision avoidance response times in a vehicle. The business outcome — the numerator in the OPT calculation — is often far more directly observable at the edge than in a general-purpose cloud deployment. This makes edge AI one of the first environments where the full AI Value Stack, from TPW at the bottom to OPT at the top, can be implemented end-to-end with real data.
The unified continuum, then, looks like this:
Layer
| Data center
| Edge
|
Facility efficiency
| PUE
| Power budget per device (W)
|
AI throughput efficiency
| Tokens Per Watt
| Tokens Per Watt (device-class)
|
Latency efficiency
| —
| Tokens Per Millisecond
|
Enterprise value
| Tokens Per Person
| Tokens Per Device
|
Business outcomes
| Outcomes Per Token
| Outcomes Per Token
|
Framed this way, the AI Value Stack is not a data center operations story. It is the measurement architecture for the entire intelligent infrastructure economy — from the hyperscale GPU cluster training foundation models to the embedded inference chip deciding, in 15 milliseconds, whether a weld is defective. The metrics change in scale; the logic does not.