The information technology industry has experienced exponential growth in the performance of computer systems over many decades, which has been mainly driven by the scaling of the transistor to near-atomic dimensions. This performance improvement has led to the “Information Age” with the integration of computer technology into every major industrial sector and the creation of an information network supported by data centers throughout the world.
What are HPC and Data Centers?
High-Performance Computing (HPC) involves the use of supercomputers and computer clusters to solve advanced computational problems. These systems are essential for scientific research, financial modeling, climate forecasting, and many other fields that require immense processing power.
Data centers, on the other hand, are facilities that house vast amounts of computing and storage resources. They enable the operation of internet services, cloud computing, and large-scale data storage. Both HPC and data centers generate significant amounts of heat due to the high density of electronic components and the intensive processing tasks they perform.
Data centers vary in size and power usage. A significant fraction of the data center energy usage is the cooling energy required to remove the heat generated by computer hardware while meeting operating temperature margins in the data center. The computer semiconductor chips in data centers operate at temperatures well above the outdoor ambient environment. Reducing the thermal impedance between the chips and the coolant enables the use of the outdoor ambient environment to provide effective cooling. This eliminates the need for energy-intensive refrigeration, which can use up to 30% of data center energy.
HPC and Data Center Cooling Systems
As shown in Fig. 1(a), a typical air-cooled server would incorporate a microprocessor chip mounted on a package substrate to provide electric connections for power and signals. Heat from the chip is removed by attaching a thermally conductive lid and an air-cooled heat sink. These are connected to the back of the chip using Thermal Interface Materials (TIMs). The temperature rise within the chip is affected by the thermal resistance between the active semiconductor devices of the microprocessor and the coolant being used to remove the generated heat.
The cooling performance and energy efficiency of a server can be significantly improved by utilizing liquid cooling to remove heat from high-power components within the server. In this case, as shown in Fig. 1(b), for a microprocessor chip package, the air-cooled heat sink is replaced with a liquid cooled cold plate. The package lid and both thermal interface materials (TIM1 and TIM2) remain the same. Other high-power components in a server, such as memory modules, graphics card, and application-specific integrated circuit (ASICs), can also be liquid cooled.
In a more advanced liquid-cooled microprocessor package, as illustrated in Fig. 3(c), the lid and both TIMs can be replaced with a single high-performance liquid metal TIM (LMTIM) that has a thermal conductivity an order of magnitude higher than commonly used TIMs. In this direct attach approach, the cold plate is attached directly on the backside of the die, which eliminates the thermal resistance of the lid and second TIM2, significantly lowering the thermal resistance path to the coolant.
Types of Cooling Materials for High-Density/Demanding Computing Applications
Traditional cooling methods in Figure 1 are often insufficient for high-density environments, prompting the need for advanced cooling technologies. Innovative cooling materials are at high demand of this transformation, offering enhanced thermal management solutions that significantly improve data center efficiency. These materials include phase change materials (PCMs), advanced thermal interface materials (TIMs), and cutting-edge heat spreaders.
Phase Change Materials (PCMs)
Phase Change Materials absorb and release large amounts of heat during reversible phase transitions , i.e. from solid to liquid. PCMs can store thermal energy and provide efficient cooling by maintaining a constant temperature during phase changes. They are engineered to melt and solidify at specific temperature ranges. For example, Honeywell’s phase change material transitions at ±45 °C. This ensures that HPC systems and data centers can maintain peak performance even under heavy workloads.
PTM7950 is the most advance in the Honeywell phase change materials product with the thermal conductivity of around 8.5 W/m-K. The material had undergone thermal shock testing up to 1000 cycles of -40°C to 200°C in which the material remain reliable and stable (Thermal shock test result). This test ensures that they can handle the demanding thermal conditions and helps in selecting the right materials that offer reliability, durability, and effective thermal management, ultimately contributing to the overall efficiency and performance of HPC systems and data centers.
Hybrid Thermal Gels – Gap Fillers
One innovative solution that has gained attention is Honeywell’s hybrid gel, a TIM designed to address the specific thermal management needs of HPC and data centers. Honeywell Hybrid Gel is an advanced thermal interface material that combines the benefits of traditional thermal greases and gels with enhanced performance characteristics. This hybrid gel is engineered to provide superior thermal conductivity, excellent conformability, and long-term reliability. It is specifically designed to fill the microscopic air gaps between heat-generating components and their heat sinks or cooling solutions.
The key attribute of the hybrid gel is its low thermal resistance. This feature enhances the overall efficiency of the thermal management system by minimizing the temperature gradient across the interface, ensuring more uniform and effective heat dissipation. The gel’s flexibility and conformability further contribute to its efficiency. Moreover reliability and stability features of the hybrid gel maintains the performance over the long term, even under extreme thermal cycling conditions, which is essential for the demanding computing and processing operations of HPC and data centers. For this application, we recommend one-part HT7000 and two-part HLT7000 hybrid gels that have high thermal conductivity (7.0 W/m⋅K) and thermal impedance (1.41 °C⋅cm2/W).
Heat Spreaders
Heat spreaders are designed to evenly distribute heat generated by high-power processors and other critical components, preventing localized hot spots that can lead to thermal throttling or hardware failure. By spreading the heat over a larger surface area, heat spreaders facilitate more efficient heat transfer to cooling systems, such as heatsinks, liquid cooling plates, or other thermal management solutions. This ensures that the temperature of key components remains within safe operating limits, thereby enhancing system stability and longevity.
Heat spreaders are commonly made from highly conductive materials such as graphite, including products like the heat spreader COOLINQ SG-25 and COOLINQ SG-90, and advanced composites. Some common materials, such as nickel and copper plating, are cheaper options; for example, HW-HSPS and HW-HSTD. These materials help distribute heat generated by electronic components in CPUs and GPUs.
Key Takeaways
In high-performance computing (HPC) and data center environments, selecting the right cooling material within the system is crucial for maintaining optimal performance and reliability. Phase Change Materials absorb and release large amounts of heat during reversible phase transitions , i.e. from solid to liquid. PCMs can store thermal energy and provide efficient cooling by maintaining a constant temperature during phase changes. They are engineered to melt and solidify at specific temperatures. For example, Honeywell’s phase change material transitions at 45 °C. This ensures that HPC systems and data centers can maintain peak performance even under heavy workloads. (PCMs) offer several advantages, including high energy storage capacity components and temperature regulation. They can absorb and release large amounts of thermal energy during phase transitions, which is beneficial for managing peak thermal loads and maintaining stable temperatures during operations. However, PCMs often require complex integration due to the need for encapsulation to prevent leakage during phase changes.
For Hybrid Gels, which provide high thermal conductivity and conformability, making them excellent for environments with high power density components. These gels can easily fill gaps and ensure good thermal contact with irregular surfaces, contributing to efficient heat transfer. They are also reliable over long-term thermal cycling, making them suitable for continuous use. However, the application of hybrid gels can be complex, requiring careful handling to avoid air bubbles, and high-performance gels can be costly.
Heat Spreaders on the other hand excel in efficiently distributing heat over a larger area, improving the performance of downstream cooling systems. They can be made from various high-conductivity materials such as copper, aluminum, or advanced composites, and are relatively simple to integrate into existing thermal management systems. Despite these advantages, heat spreaders do not store thermal energy like PCMs; they only redistribute it. The effectiveness of a heat spreader can also be limited by the thermal resistance of the interface materials used to attach it to heat sources and sinks. Additionally, depending on the material and design, heat spreaders can add significant size and weight to the system. Heat spreaders are best suited for applications where uniform heat distribution is needed to enhance the efficiency of attached cooling systems.
Effective thermal management in HPC and data centers relies on understanding and optimizing key thermal properties. By carefully selecting and integrating materials based on these properties, HPC and data center operators can maintain optimal operating conditions. Contact us and our application engineers and in-house thermal experts will help you out with product selection for your application requirements.