Skip to main content Skip to footer
  • "com.cts.aem.core.models.NavigationItem@67569202" Careers
  • "com.cts.aem.core.models.NavigationItem@57a67711" News
  • "com.cts.aem.core.models.NavigationItem@6a0cc06d" Events
  • "com.cts.aem.core.models.NavigationItem@4bfc5acb" Investors
Cognizant Blog

The Metric That Built the Modern Internet Is Failing Us

In 2006, a handful of engineers from Microsoft, Intel, with IT hardware forecast data from IDC and the newly formed Green Grid sat down in California and invented a number that would quietly govern the entire data center industry for the next two decades. Power Usage Effectiveness — PUE — is elegantly simple: divide total facility power consumption by the power consumed by IT equipment alone. A perfect score of 1.0 means every watt entering the building goes directly into computation. A PUE of 2.0 means half of all energy is wasted on cooling, lighting, and other overhead. For the era of email servers, file storage, and web hosting, it was exactly the right instrument for exactly the right problem.


That era is over!

The data centers we are building today are not the data centers PUE was designed to measure. They are factories for intelligence — warehouses of GPUs running at 700 to 1,200 watts per chip, clustered into racks drawing 50 kilowatts or more, cooled by liquid rather than air, and executing a fundamentally different kind of workload than anything the Green Grid's founders imagined. And yet, the industry continues to report PUE as its primary efficiency benchmark, as if the only question worth asking about a Formula 1 car is whether the air conditioning works.

The consequences of this blind spot are not academic. Global data centers consumed around 415 terawatt-hours of electricity in 2024. The IEA projects that figure will roughly double to 945 TWh by 2030 — growing at approximately 15% per year, more than four times faster than total global electricity demand. AI workloads are the primary driver, with electricity consumed by accelerated servers (GPUs and TPUs) projected to grow at 30% annually. Training a single large model like an early ChatGPT version is estimated to have generated over 550 tons of carbon dioxide — equivalent to the annual footprint of more than 120 U.S. households. The energy stakes have never been higher. The metric we use to manage them has never been less adequate.

It is time to bury PUE — or at least demote it — and replace it with a framework centered on a single, powerful idea: how much intelligence does this infrastructure produce per person, per dollar, and per watt?

What PUE Gets Wrong in the Age of AI?

To be fair to PUE's defenders — and there are many, including Equinix, who argue its death has been exaggerated — the metric does what it was designed to do. It measures the overhead efficiency of a data center's physical plant: how much energy is wasted getting power to the servers and keeping them cool. On that narrow question, it remains useful. A facility with a PUE of 1.2 genuinely runs its cooling and power distribution more efficiently than one running at 1.5.

But PUE has a fatal flaw that AI has exposed: it is entirely silent on what the IT equipment is actually doing with the power it receives.

Consider two hypothetical AI data centers, each with a PUE of 1.3. Facility A is running aging GPU clusters at 40% utilization, processing batch inference jobs slowly and inefficiently. Facility B has the same physical footprint but runs next-generation accelerators at 85% utilization on optimized inference workloads. By PUE, they are identical. In reality, Facility B may be delivering ten times the useful AI output per watt of total energy consumed. PUE cannot see this. PUE does not care.

NVIDIA made this argument publicly in 2024. Jeremy Rodriguez, NVIDIA's senior director of data center engineering, noted that when modern GPU systems report rising input power in watts, that does not mean they are less efficient — in fact, they often do vastly more work per unit of energy consumed. The metric, he argued, needs to reflect "useful output." The Uptime Institute has concurred, proposing a "work per energy" metric that it has been advocating since the very same 2006 Green Grid meetings where PUE was born — and which, sixteen years later, still had not been widely adopted.

The EU's Energy Efficiency Directive is now forcing the issue. European regulators are developing a mandatory data center labeling scheme set to launch in 2026, and officials are actively struggling to define meaningful IT efficiency metrics that link performance to energy use. The Uptime Institute has proposed a "work per energy" standard, but even this faces challenges: it requires reporting three separate figures for CPU workloads, AI inference, and AI training — because combining them into a single number distorts reality almost as badly as PUE does.

There is also a subtler distortion that AI makes worse. PUE can actually improve when data centers turn off idle servers, reducing IT load — making the facility look more efficient even though it is producing less work. This inverse relationship between IT utilization and PUE creates perverse incentives in an age when maximizing GPU throughput is the entire point of the enterprise.

The verdict is clear: PUE is a facility metric masquerading as a performance metric. For the AI era, we need something better at every layer of the stack — from the data center floor all the way up to the enterprise desktop.

Tokens Per Watt: The Infrastructure Efficiency Metric We Actually Need

If PUE measures how well a building delivers power to computers, we need a companion metric that measures how well those computers deliver intelligence to users. The answer already exists, even if it has not yet achieved the institutional momentum of PUE.

Tokens Per Watt (TPW) — the number of AI tokens generated or processed per watt of total energy consumed — is the natural successor to PUE as the primary efficiency benchmark for AI infrastructure. Schneider Electric has been advocating for this. Fluix AI describes it as shifting the conversation "from consumption to productivity." It reframes data centers not as power-hungry facilities to be managed, but as factories for intelligence to be optimized.


The math is intuitive. A large language model inference workload that generates 1,000 tokens per second across a GPU cluster drawing 10 kilowatts produces 100 tokens per watt. The same workload on older, less efficient hardware drawing 30 kilowatts produces only 33 tokens per watt — a threefold efficiency gap that PUE would never reveal if the overhead cooling in both facilities was equally well-designed.

TPW creates the right incentives at the infrastructure level:

  • GPU procurement decisions shift from raw performance benchmarks to efficiency-per-task
  • Cooling investments are justified by AI output gains, not just PUE score improvements
  • Workload scheduling is optimized to maximize throughput per joule, not just minimize idle power draw
  • Grid partnerships become negotiations over energy productivity, not just capacity

Critically, TPW also captures the extraordinary improvements that hardware generations are delivering. GPU computational performance per watt has improved by an estimated 4,000-fold over the last decade, according to NVIDIA. A metric like PUE that ignores compute throughput entirely cannot capture — or incentivize — this kind of progress. If we only measure the building's overhead, we only optimize the building's overhead.

The SaaS Reckoning: From Seats to Tokens Per Person

The infrastructure layer is only half the story. The more immediate — and commercially disruptive — implication of the AI era is what it does to the fundamental unit of enterprise software pricing.

For three decades, the SaaS industry has operated on a simple model: count the users, multiply by a monthly fee. Seat licensing worked because the marginal cost of serving one more user was essentially zero. Whether you used Microsoft Word once a week or forty hours a week, the cost to Microsoft was the same. Seat count was a reasonable proxy for value delivered.

AI has shattered this assumption entirely.

When an enterprise deploys a large language model — whether through Microsoft Copilot, Salesforce Einstein, ServiceNow AI, or dozens of other AI-embedded SaaS platforms — the cost per user is not fixed. It is radically variable, driven by how many tokens that user consumes. A knowledge worker who uses AI to draft a single email per day consumes perhaps 2,000 tokens. A power user who runs AI-assisted research, document generation, code review, and customer analysis across their entire workday might consume 500,000 tokens or more. Under a seat license, both pay the same amount. The economics are broken.


NVIDIA's Jensen Huang has predicted a $100 trillion AI token economy in which computing power is eventually traded like electricity — metered, priced, and consumed on demand. That vision is not a distant abstraction. It is already arriving. The 2025 Monetization Monitor found that 59% of software companies expect usage-based approaches to grow as a percentage of revenue — an 18% increase from just two years prior. IDC research confirms that usage-based pricing is now the preferred choice among SaaS buyers. Software companies relying entirely on traditional seat-based models, as NetLicensing has noted, face existential pressure to evolve.

The transition, however, is not simply "charge per token instead of per seat." The right model is more nuanced, and it demands a new enterprise KPI that does not yet have a universal name. I propose we call it Tokens Per Person (TPP) - the average number of AI tokens consumed per active user per month, across an organization's entire AI-enabled software estate.


TPP is not a billing metric. It is a value and efficiency metric for the enterprise technology function
. Here is why it matters:

As a value indicator, TPP tells you whether your workforce is actually using AI or just paying for it. An organization spending $500 per seat per year on AI-embedded SaaS but averaging only 5,000 tokens per person per month has an adoption problem. An organization at 200,000 tokens per person per month is extracting real productivity value and can justify investment in higher tiers. TPP makes AI ROI legible.

As an efficiency indicator, TPP — benchmarked against outcomes like tickets resolved, documents produced, or decisions accelerated — tells you whether your AI deployment architecture is efficient. Are you using the right model sizes for the right tasks? Are you caching repeated prompts to reduce token consumption? Are you routing low-complexity queries to smaller, cheaper models? These are the optimization questions that drive real cost management, and they only become visible when TPP is measured.

As a procurement lever, TPP transforms the buyer-vendor conversation. Instead of negotiating seat counts, enterprise procurement teams can negotiate token budgets — monthly token allowances per user, with overage pricing, volume discounts, and model-tier flexibility. This aligns incentives correctly: the vendor's infrastructure costs scale with actual usage, the buyer's costs scale with actual value delivered.

Introducing the AI Value Stack: A New KPI Framework

PUE, TPW, and TPP are not competing metrics. They are complementary layers of a coherent framework that creates the AI Value Stack — a set of efficiency and productivity KPIs that spans the entire chain from electrons to enterprise outcomes.

 

Layer

Metric

What It Measures

Who Owns It

Facility

PUE

Overhead energy efficiency of physical plant

Data center operators

Infrastructure

Tokens Per Watt (TPW)

AI output per unit of total energy

Data center + cloud operators

Platform

Cost Per Million Tokens (CPMT)

Efficiency of AI model deployment

Cloud & AI platform vendors

Enterprise

Tokens Per Person (TPP)

AI consumption per active user

CIOs, IT leaders

Business

Outcomes Per Token (OPT)

Business value generated per AI unit

Business line leaders, CFOs


Each layer asks a different question, optimizes a different system, and is owned by a different stakeholder. PUE remains useful at its layer — facility overhead — and should not be abandoned so much as contextualized: it is a necessary but far-from-sufficient condition for AI infrastructure efficiency.

The truly transformative KPIs are TPW and TPP, because they connect what has historically been a purely physical metric (watts consumed) to what the enterprise cares about (intelligence produced and work accomplished). And Outcomes Per Token — measuring things like customer issues resolved per million tokens, documents generated per thousand tokens, or revenue influenced per token budget — is the long-term destination: a KPI that makes AI expenditure directly comparable to any other business investment.

The Road to Tokens Per Person: What Needs to Change

Adopting TPP as a standard enterprise KPI is not simply a matter of will. It requires structural changes at three levels.


At the vendor level
, SaaS companies must move beyond bundled token limits hidden inside seat tiers and expose token consumption as a first-class metric in their analytics and billing interfaces. Some are already moving in this direction — Anthropic's Claude, OpenAI's API, and Google's Gemini all have token-level billing in their developer tiers, even if their enterprise seat licenses still obscure this data. The next step is dashboards that give enterprise buyers real-time visibility into TPP across their organization, broken down by department, use case, and model tier.

At the enterprise level, technology leaders need to treat AI token budgets the same way they treat cloud compute budgets — with governance frameworks, cost allocation, and optimization disciplines. The average monthly AI spend per organization rose from $63,000 in 2024 to $85,500 in 2025, a 36% increase. Nearly half of companies now spend over $100,000 per month on AI. These are material expenditures that demand the same scrutiny as any other IT investment, and TPP is the instrument that makes that scrutiny possible.

At the infrastructure level, data center operators and hyperscalers need to publish TPW benchmarks alongside PUE scores, enabling buyers to assess not just the overhead efficiency of a facility but its actual AI productivity. This would create a market dynamic analogous to what happened with PUE itself after The Green Grid began publishing benchmarks in the late 2000s: operators competed to improve their scores, driving industry-wide efficiency gains. The same dynamic, applied to TPW, would drive competition on AI output efficiency — exactly the right optimization target for the energy-constrained decade ahead.

The Regulatory Catalyst

History suggests that voluntary industry adoption of new metrics is slow. PUE took years to become standard even after the Green Grid published it. The adoption of TPW and TPP at scale will likely require a combination of regulatory pressure and customer demand.

On the regulatory side, the EU's Energy Efficiency Directive is already creating the conditions. European regulators are mandating IT efficiency reporting for data centers and are actively developing a "work per energy" standard. Once that standard is adopted (likely by 2026 or 2027), it will become the de facto global benchmark for AI data center efficiency, much as European energy regulations have historically shaped global markets. A TPW-aligned metric that the EU mandates in Frankfurt and Amsterdam will quickly become standard in Northern Virginia and Singapore.

On the customer side, enterprise CFOs are increasingly aware that their AI bills are growing faster than their AI outcomes. The move from seat licenses to token budgets is a natural response to this pressure — and once large enterprises begin demanding TPP-based billing and benchmarking, vendors will have no choice but to provide it.

Conclusion: Measuring Intelligence, Not Just Electricity

The history of technology is partly a history of metrics — the numbers that shape investment decisions, competitive strategies, and industrial standards. Miles per gallon transformed the automotive industry. The cost per transistor drove Moore's Law. Uptime percentages defined the SLA economy. In each case, the right metric did not just measure progress — it created the conditions for progress by aligning incentives and making performance legible.

PUE was the right metric for the server farm era. It drove real efficiency gains: the industry improved average PUE from 2.5 in 2007 to 1.55 in 2022, a remarkable achievement. But those efficiency gains are now being swamped by demand growth, and the metric itself is blind to the dimension that matters most: how much intelligence are we producing for the energy we are consuming?

The data center of 2026 is an intelligence factory. The SaaS platform of 2026 is an intelligence delivery vehicle. The enterprise of 2026 is an intelligence consumer. None of these actors are well-served by a metric designed for a world where the question was simply "how cold is your building?"

Tokens Per Watt tells us how productive our intelligence factories are. Tokens Per Person tells us how efficiently our organizations are consuming the intelligence delivered to them. Together, they form the foundation of a measurement framework fit for the AI economy — one that connects electrons to outcomes, infrastructure to value, and investment to return.

The question is not whether these metrics will replace PUE. The question is whether the industry moves fast enough to adopt them before the energy, cost, and governance crises of AI scale force the issue. The organizations that get ahead of this — that build TPW into their procurement criteria and TPP into their AI governance frameworks today — will have a structural advantage in managing the intelligence economy of tomorrow.

PUE was a good metric for a different world. It is time to build the metrics for this one.

Next Steps: A Call to Action for Every Stakeholder

Frameworks without action are just theory. The AI Value Stack will only become the industry standard if specific actors move deliberately to adopt, instrument, and advocate for it. Here is what each stakeholder group should do next.

Data center operators and hyperscalers should begin publishing Tokens Per Watt benchmarks alongside PUE scores in their sustainability and transparency reports — even as an experimental metric. The investment in instrumentation is real but achievable, and the organizations that establish TPW baselines now will set the reference points that others are measured against. Waiting for a standard to emerge before measuring is how industries sleepwalk into regulatory mandates.

Cloud and AI platform vendors should expose token consumption as a first-class metric in their customer dashboards — broken down by user, department, use case, and model tier. This is a competitive differentiator, not a concession. The vendors that make AI costs legible will win the trust of enterprise procurement teams currently flying blind on their largest and fastest-growing IT line item.

SaaS vendors with AI-embedded products should begin the transition from pure seat licensing to hybrid models that include token-based components. This does not require abandoning seat pricing overnight — a base seat fee with a transparent token allowance and visible overage reporting is a commercially viable first step that aligns vendor costs with customer value.

Enterprise technology leaders — CIOs, CTOs, and AI platform owners — should treat AI token budgets with the same rigor they apply to cloud compute budgets. That means establishing a Tokens Per Person baseline for every AI-enabled tool in the estate, identifying the adoption gap between licensed seats and active token consumers, and building TPP targets into AI program governance for 2026 and beyond.

Enterprise procurement and finance teams should begin renegotiating AI contracts with token transparency as a non-negotiable requirement. Vendors who cannot or will not provide TPP-level reporting should be treated as a governance risk. The average organization spending $85,500 per month on AI in 2025 has every right to know what it is getting per token.

Edge AI manufacturers and OEM hardware vendors should publish tokens-per-watt specifications for inference accelerators as a standard part of product datasheets — alongside existing benchmarks for TOPS (tera-operations per second) and thermal design power. Tokens per millisecond should be added for real-time application classes. These specs will become purchasing criteria as enterprise edge deployments scale.

Standards bodies and regulators — particularly the EU's Energy Efficiency Directive working groups and organizations like The Green Grid and Uptime Institute — should accelerate alignment between the emerging "work per energy" standard and the Tokens Per Watt metric proposed here. A single, auditable, internationally recognized TPW standard would do for AI infrastructure efficiency what PUE did for data center overhead efficiency in the 2010s — but faster, and with far higher stakes.

Investors and ESG analysts should begin incorporating TPW and TPP into their assessment of AI infrastructure companies and enterprise AI programs. A company that cannot articulate its tokens-per-watt trajectory or its enterprise tokens-per-person adoption rate is not managing its AI efficiency — and in an energy-constrained decade, that is a material risk, not a technical footnote.

The Edge Computing Extension: From Core to Continuum

The AI Value Stack was introduced here as a data center framework. But the most powerful version of this argument is broader: it is a framework for the entire AI compute continuum — from hyperscale cloud to the edge device in your factory, your vehicle, or your hospital ward.

Edge AI is not a footnote. The edge AI market is projected to exceed $70 billion by 2030, and the majority of real-world AI deployment — in manufacturing, retail, healthcare, logistics, and transport — happens at or near the edge, not in a Northern Virginia hyperscale campus. Any efficiency framework that applies only to centralized data centers is already describing yesterday's infrastructure.

The good news is that the AI Value Stack transfers to the edge — but with important modifications at each layer that actually strengthen the overall argument.

PUE is even less useful at the edge than in the data center. A GPU inference card running inside a 5G base station, a retail kiosk, or an autonomous vehicle shares power infrastructure with non-IT systems in ways that make PUE essentially unmeasurable. There is no clean separation between "IT load" and "total facility load" when the facility is the device. If PUE was already inadequate for AI data centers, it is completely irrelevant at the edge — which makes the case for successor metrics even more urgent.

Tokens Per Watt becomes an operational ceiling, not just a benchmark. Data center GPU racks draw tens of kilowatts. Edge AI accelerators draw 10 to 25 watts. The absolute numbers are different, and device-class benchmarks are needed rather than facility-level ones — but the concept is more powerful, not less. At the edge, energy budgets are hard constraints: battery life, thermal envelopes, and remote-site grid capacity are not negotiable. Tokens per watt at the edge is not an efficiency aspiration — it is an engineering requirement.

Tokens Per Person maps naturally to Tokens Per Device. Most edge AI deployments are not SaaS seat licensed. They are embedded in hardware, sold per-device, or tied to specific physical assets. The right reframing is Tokens Per Device per watt-hour — a metric that directly captures the AI productivity of a constrained embedded system and creates a basis for comparing generations of edge hardware, optimizing model selection for deployment, and managing fleet-wide AI costs in industries like automotive, industrial automation, and smart infrastructure.

Latency becomes a co-equal efficiency dimension. Edge AI exists largely because round-trip latency to a centralized data center is unacceptable for real-time applications. This means the AI Value Stack needs a sixth dimension at the edge: a latency-adjusted efficiency score, or what might be called Tokens Per Millisecond — capturing not just how much intelligence a device produces per watt, but how quickly it delivers that intelligence when it matters. A model producing 80 tokens per watt at 200ms latency may deliver less real-world value than one at 40 tokens per watt and 20ms, depending on the application.

Outcomes Per Token is arguably more legible at the edge than anywhere else. Edge AI is typically deployed for specific, measurable tasks: defect detection rates on a production line, customer dwell time at a retail display, collision avoidance response times in a vehicle. The business outcome — the numerator in the OPT calculation — is often far more directly observable at the edge than in a general-purpose cloud deployment. This makes edge AI one of the first environments where the full AI Value Stack, from TPW at the bottom to OPT at the top, can be implemented end-to-end with real data.

The unified continuum, then, looks like this:

Layer

Data center

Edge

Facility efficiency

PUE

Power budget per device (W)

AI throughput efficiency

Tokens Per Watt

Tokens Per Watt (device-class)

Latency efficiency

Tokens Per Millisecond

Enterprise value

Tokens Per Person

Tokens Per Device

Business outcomes

Outcomes Per Token

Outcomes Per Token


Framed this way, the AI Value Stack is not a data center operations story. It is the measurement architecture for the entire intelligent infrastructure economy — from the hyperscale GPU cluster training foundation models to the embedded inference chip deciding, in 15 milliseconds, whether a weld is defective. The metrics change in scale; the logic does not.

 


Vernon Turner

Advisor - Sustainability, IoT, Cognizant

Author Image


Latest posts
Related posts