$NVDA $MSFT $CEG $VST The new AI Energy Score data set shows that chain-of-thought “reasoning” models increase per-query energy consumption by orders of magnitude and are emerging as a first-order driver of AI infrastructure demand, unit economics, and power-system risk. Benchmarking by Hugging Face and Salesforce across 40 open models from OpenAI, Google, Microsoft, DeepSeek and others finds that enabling reasoning increases GPU energy use by roughly 100x on average per 1,000 prompts versus the same or similar models with reasoning disabled, with some models showing 500–6,000x deltas. The most extreme example cited is a slimmed-down DeepSeek R1 model: energy use rises from 49.53 Wh to 308,185.51 Wh per 1,000 prompts when reasoning is turned on, implying an increase from approximately 0.05 Wh to approximately 308 Wh per query. The primary driver is not model size but output length: reasoning modes generate 300–800x more tokens through explicit internal “monologue.” The benchmarking uses standardized hardware and CodeCarbon to measure GPU watt-hours, so absolute facility-level energy is understated but the on/off delta for a given model is robust. In contrast, Google’s internal Gemini data put the median production text prompt at 0.24 Wh and claim a 44x reduction in per-prompt energy over 1 year, illustrating the scope for optimization but also highlighting the enormous dispersion across model types and workloads. Critically, the AI Energy Score work finds that newer models are not systematically more energy-efficient on a per-task basis; in matched comparisons, some newer non-reasoning models use only 3% of the energy of prior cohorts while others use up to 4x more, contradicting the assumption that algorithmic progress alone will offset rising usage.
These micro-level findings sit within a macro context of rapidly rising data center power demand and a structural shift of AI energy use toward inference. Data centers consumed roughly 415 TWh of electricity in 2024 (about 1.5% of global demand), a figure the IEA projects will approach 945 TWh by 2030, just under 3% of global consumption and comparable to today’s Japan. In the US, data center consumption is estimated at about 183 TWh in 2024 (>4% of national use) and is projected to reach approximately 426 TWh by 2030, a 133% increase, with AI workloads as the main incremental driver. Available evidence from Meta, Google and independent analyses suggests that 60–90% of ML lifecycle energy is already attributable to inference rather than training, and the rise of reasoning models pushes the mix further toward inference because the incremental energy cost is incurred at query time. In several regions with dense data center buildout, wholesale electricity prices have risen by 200–270% over 5 years, grid operators are facing capacity constraints, and communities are raising concerns about water use and land impacts. This combination makes the energy intensity of deployed AI workloads a central determinant of both system-wide load growth and localized power and environmental stress.
At the level of unit economics, the AI Energy Score results imply that energy, historically a small component of AI inference COGS, can become material for reasoning-heavy workloads. Using the DeepSeek example, with an assumed delivered electricity cost of $0.07–$0.10/kWh and PUE of 1.2, per-query power cost rises from effectively negligible in non-reasoning mode to approximately $0.025–$0.037 in reasoning mode. For high-value enterprise applications this is manageable, but for mass-market products priced at fractions of a cent per 1,000 tokens, such as search augmentation or low-end APIs, energy costs of this magnitude consume a significant share of gross margin once cooling and overhead are fully accounted for. At the same time, many commercial AI offerings are priced with flat per-seat fees and generous usage caps that implicitly assume low average cost per query. If a meaningful subset of users shifts toward very heavy reasoning usage, cost per user can rise nonlinearly while competitive pressure keeps pricing low, compressing margins. This dynamic increases the importance of “energy-aware inference”: routing simple queries to small non-reasoning models and escalating only complex tasks to reasoning LLMs, gating access to expensive modes behind higher-priced tiers or explicit usage limits, and optimizing prompts and outputs to minimize unnecessary token generation.
The investment implications are most immediate for hyperscale cloud and AI platforms, but they cascade across utilities, infrastructure, semiconductors, commodities and device ecosystems. For Microsoft, Alphabet, Amazon and Meta, reasoning-heavy workloads reinforce a sustained high-capex cycle in AI data centers, accelerators, networking and power, increasing capital intensity and pushing parts of the business model closer to a utility profile. These firms are responding by locking in long-duration PPAs, investing directly in renewables and nuclear, and in some cases co-locating generation with data centers. This deepens their moat versus smaller AI vendors that lack the balance sheet to underwrite power infrastructure, but it also raises the risk that regulators and investors begin valuing segments of the cloud franchise on infrastructure-like return and multiple frameworks. Data center REITs and colocation providers benefit from surging AI demand and rising rack power densities toward 240–1,000 kW, with power availability becoming the primary bottleneck and a key source of pricing power for operators with expandable grid connections. Regulated utilities and independent power producers gain from high load-factor, creditworthy hyperscale customers that justify large new generation and transmission projects; some projections suggest that several hundred GW of new capacity may be required by 2035, with AI data centers a major contributor. This supports rate base growth for regulated utilities and strengthens the outlook for gas, renewables and nuclear, especially in regions with abundant fuel and supportive policy. Copper and other grid and data center materials face additional demand: AI data centers are estimated to require roughly 27–33 t of copper per MW, compounding existing deficits driven by electrification and tightening supply. Suppliers of power and cooling equipment such as advanced switchgear, UPS, and liquid cooling systems are levered to the move toward very high-density racks and gigawatt-scale campuses, and may see improving pricing power as legacy infrastructure becomes inadequate.
By contrast, AI software companies and vertical AI vendors that do not control infrastructure are more exposed. These businesses are often valued on SaaS-style multiples but rely on cloud inference services whose COGS embed both chip depreciation and increasingly significant energy costs. As they adopt reasoning models to remain competitive on quality, their per-query cost structure escalates while pricing is constrained by competition from hyperscalers and open-source offerings. Without control over power procurement or data center efficiency, they are price-takers on both compute and energy and are at risk of structurally low or negative gross margins unless they implement sophisticated model routing, workload engineering and contract structures. At a system level, AI’s rising energy demand is creating tension with tech companies’ climate commitments. Critics argue that net-zero pledges at major platforms are becoming less credible as data center emissions rise faster than decarbonization progress, and internal activism plus external NGO pressure are intensifying. Policymakers are starting to respond with calls for standardized energy benchmarking and disclosure. The AI Energy Score project positions itself as an “ENERGY STAR for AI,” providing 1–5 star ratings and a public leaderboard now covering more than 160 models, while the EU AI Act, IEEE and industry groups encourage inclusion of per-inference energy and carbon metrics in procurement. These developments increase the probability of tighter regulation around data center siting, 24/7 carbon-free energy sourcing, and potentially energy- or emissions-based pricing of AI workloads, raising regulatory and ESG risk premia for the most power-intensive AI business models.
Several mitigating factors and uncertainties must be recognized. Rapid improvements in hardware and software efficiency, as illustrated by Google’s reported 44x per-prompt energy reduction for Gemini in 1 year, could significantly lower energy per reasoning query over a 3–5 year horizon. Product design and model routing can constrain the share of workloads that invoke full reasoning, focusing those capabilities on high-value tasks while simpler queries are handled by lightweight models or on-device NPUs. Inference migration to edge devices can offload some demand from centralized data centers and create incremental opportunities for smartphone and PC chip and device OEMs, although heavy reasoning and very large context windows will remain data center–centric. Finally, current measurement of AI energy use is fragmented and methodologically inconsistent; facility boundaries, idle capacity accounting, and inclusion or exclusion of training all vary across sources. As a result, point estimates for AI-related energy and emissions should be treated as directional rather than precise. The central conclusion, however, is robust: widespread adoption of reasoning models materially increases the energy intensity and cost of AI inference, shifts long-run value capture toward infrastructure and energy suppliers, and amplifies regulatory, climate and margin risks for AI platforms and application-layer vendors. Integrating power availability, performance-per-watt, and workload mix into AI investment theses is now a necessary condition for accurate valuation and risk assessment across the TMT, utilities, infrastructure and commodity complex.