Microsoft quantifies environmental impacts of datacenter cooling from ‘cradle to grave’ in new Nature study

Microsoft quantifies environmental impacts of

Table of Contents

Microsoft researchers have published a paper in Nature that quantifies for the first time how much energy and water are consumed and greenhouse gas emissions are produced by four datacenter cooling techniques across the entire lifespan of the datacenters. This so-called life cycle assessment evaluates more than just which resources are consumed during datacenter operations – it also digs into what was required to produce all the virtual machines, chips, servers, cooling and other support equipment, such as extracting raw materials, manufacturing components, transportation at different points in the process and even eventual disposal. That’s the type of data that researchers say can help companies design data centers to use less carbon, energy, and water.
“A lot of people do life cycle assessments after the fact,” to understand a datacenter’s environmental impact after it’s built, said Husam Alissa, director of systems technology in Cloud Operations and Innovation at Microsoft and leader of the lifecycle assessment study. “When making future design decisions, we typically look at total cost of ownership, performance, sustainability, and other factors. In this paper, we advocate the use of life cycle assessment tools to guide engineering decisions early on and share the tool with the industry to make adoption easier.”

Microsoft plans to use the findings from life cycle assessments that account for data centers’ carbon, water, and energy impacts to inform new data center designs and cloud operations and to help meet its broader sustainability goals.
For instance, the study found that switching from air cooling to cold plates that cool datacenter chips more directly – a newer technology that Microsoft is deploying in its datacenters – could reduce greenhouse gas emissions and energy demand by roughly 15 percent and water consumption by 30 to 50 percent across the datacenters’ entire life spans. This is not just water used for cooling, but also in power generation and the manufacturing of components. This level of detail is rare and hard to uncover; it took the team more than two years to complete the analysis.
That’s why Microsoft is also making the methodology available to others in the industry through an open research repository. The researchers also presented preliminary results at the Open Compute Project (OCP) Global Summit, where the industry shares hardware designs and best practices to support growing demand for compute infrastructure. The work builds on Microsoft’s continued efforts to construct unified life cycle assessment methods and tools for cloud providers.
The paper is the first to detail how to construct a life cycle assessment for cooling in cloud operations, taking into account software, chips, servers, buildings, and the energy grid. It also introduces a new approach to help others build apples-to-apples comparisons of environmental burden. Through the open research repository, anyone in the industry will be able to plug in their data and scenarios to conduct a life cycle assessment of their operations.
“Our intention is not to say, ‘this is the right technology.’ They all could be. Different circumstances make you use a technology,” Alissa said. “What we’re trying to do here is tell the industry, ‘Here’s how you build an end-to-end life cycle assessment that takes cooling into account. And here is a tool for you that you can customize to your specific needs and then make a decision.”
The published Nature study covers cooling chips for general computing, or CPUs, not the specialized chips designed to handle AI workloads. The team is working on a follow-up to examine the life cycle impacts of AI chips and expects to see similar improvements with advanced cooling methods.

Of course, datacenter operations depend on external factors such as local energy grids. The Nature paper also quantified how much energy, water, and greenhouse gas emissions could be saved by switching from a typical energy grid to 100 percent renewable sources of energy, finding that greenhouse gas emissions could be reduced by 85 to 90 percent regardless of what cooling technologies were used.
Microsoft is aiming to replace its energy load with 100 percent renewable energy. In locations where fully renewable energy isn’t available through the local grid, it purchases a comparable amount of renewable energy available elsewhere.

Four types of cooling technology

For the study, the team spent more than two years studying four cooling technologies: air cooling, cold plates, one-phase immersion, and two-phase immersion for servers. Air cooling has been a standard approach for cooling data centers, but the industry has more recently been exploring cooling technologies that rely on liquids, which can dissipate heat much more directly and efficiently than air. Cold plates are a type of direct-to-chip cooling because a coolant is pumped in a loop to a flat container that sits right on top of the chips in a server rack. One-phase immersion involves operating servers in a tank that has cooling fluid pushed through in a circuit. In two-phase immersion, server racks are in a tank filled with a different fluid that boils at low temperatures, with the vapor rising to condense, thus cooling, and returning to the tank.
The study found that cold plates and the two immersion cooling technologies reduce greenhouse gas emissions 15 to 21 percent over their entire life cycles, energy demand 15 to 20 percent, and water consumption 31 to 52 percent in data centers, compared with air cooling. The team expected the three liquid methods to outperform air cooling on carbon emissions and water and energy consumption, but those benefits hadn’t been quantified across the entire life cycle of those technologies.

Two-phase immersion has potential for reductions in all areas, but it currently uses liquid polyfluoroalkyl substances, or PFAS, which are under regulatory scrutiny in the European Union and the U.S., putting them at odds with pollution-reduction goals and possibly making them unavailable in the future. Microsoft has investigated but is not currently using immersion cooling technologies in datacenter operations.
Microsoft is already installing cold plates in its data centers while also exploring other next-generation cooling techniques. For instance, Microsoft has begun deploying rack-scale cold plate cooling technology using heat exchanger units, or “sidekicks,” alongside AI infrastructure servers powered by the latest GPUs.
“It was interesting to see that cold plates could be as good as the two immersion cooling methods,” said Teresa Nick, director, natural systems and sustainability for Cloud Operations and Innovation at Microsoft, and co-author of the paper in Nature.

Choosing the right method: It’s complicated

The calculation of design factors such as total cost of ownership, availability, time to market, and even reliability is mostly straightforward to quantify and compare. That isn’t the case with sustainability impacts, which can be hard to define and calculate across an entire supply chain and datacenter ecosystem.

Getting information about how raw materials were obtained, as well as the carbon, water, and energy involved in manufacturing, can be difficult. The Microsoft authors pressed suppliers to divulge such data, though not all of them participated, then produced formulas so they – and others – could estimate the figures in the future. “Having the embodied emissions known, public and shared in databases could help accelerate life cycle assessment efforts,” Alissa added.
Life cycle assessment can also be used to inform all aspects of datacenter structure and function, including how to optimize and run a datacenter most efficiently. One technology might do better on one measure, another better on a different one, with none of them outperforming in every area.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Related News >