NVIDIA Rubin Aims for Zero Water Use With 45C Liquid Cooling

Walk into any traditional data center and the first thing that hits you is the sensory assault: a deafening roar of industrial fans and a biting, artificial chill. For decades, the industry has operated under a singular, rigid dogma that the colder the server room, the more efficient the compute. This freezer-like environment became the gold standard for maintaining uptime, but as the era of generative AI pushes power densities to their physical limits, the old way of moving air is no longer just inefficient—it is an obstacle. The industry is currently witnessing a pivot from the data center as a storage warehouse to the AI factory as a power plant, and the thermal requirements of these factories are forcing a total rewrite of infrastructure physics.

The Architecture of the 100% Liquid-Cooled Factory

NVIDIA is addressing this thermal wall with the Rubin generation infrastructure, marking a departure from hybrid cooling to a world-first 100% liquid cooling implementation. In previous generations, liquid cooling was a surgical application, reserved for the hottest components like GPUs or CPUs via cold plates, while the rest of the chassis relied on air. Rubin eliminates this duality. Every computational chip and network component is now integrated into a comprehensive liquid cooling envelope. By removing cooling fans entirely, NVIDIA has replaced the chaotic turbulence of air with a precise, closed-loop liquid circulation system.

The technical cornerstone of this design is the operating temperature threshold. NVIDIA has engineered the system to function with a coolant inlet temperature of up to 45 degrees Celsius (113 degrees Fahrenheit). This is a strategic utilization of silicon physics, recognizing that modern processors can maintain peak performance in environments significantly warmer than previously thought. To make this theoretical efficiency actionable for operators, NVIDIA introduced the NVIDIA DSX AI factory reference design. This framework serves as a comprehensive blueprint for the entire infrastructure stack, providing the exact guidelines needed to design, build, and operate the cooling paths from the individual chip to the facility-wide network.

The chemistry of the cooling loop is equally specific. The system utilizes a mixture of 75% water and 25% propylene glycol, the latter acting as a critical antifreeze and corrosion inhibitor to protect the hardware over long lifecycles. This fluid enters the cold plates—metal plates in direct contact with the chips—at 45 degrees Celsius, absorbs the thermal load, and exits at approximately 55 degrees Celsius. By designing a loop that distributes liquid to multiple high-power chips through a single inlet and outlet, NVIDIA has simplified the cooling structure at the tray level, moving the heated fluid through a closed loop to external dry coolers, which are essentially massive radiator coils that dissipate heat into the ambient air.

The Economic Pivot from Chillers to Dry Coolers

The true innovation of the Rubin architecture is not just that it uses liquid, but the temperature at which it operates. In the traditional data center model, the goal was to keep the liquid as cold as possible, which required the constant operation of chillers—energy-intensive refrigeration units that act like giant air conditioners. However, by raising the inlet threshold to 45 degrees Celsius, NVIDIA has created a scenario where the chiller becomes optional. If the outside ambient temperature is lower than 45 degrees, the system can rely entirely on dry coolers to shed heat. This shift toward chiller-less operation is the primary driver of the system's efficiency, as it removes the most power-hungry component of the cooling chain.

The financial implications of this shift are staggering when scaled to hyper-scale environments. For a 50MW facility, transitioning to this liquid cooling infrastructure can result in annual savings of over 4 million dollars in energy and water costs. The logic is rooted in a simple but powerful ratio: in environments where cooling accounts for up to 40% of total power consumption, raising the chiller temperature by just 1 degree can reduce cooling energy costs by approximately 4%. By pushing the threshold to 45 degrees, NVIDIA allows operators to reduce chiller runtime to less than 1% of the year in many climates, drastically lowering hardware wear and extending maintenance cycles.

Beyond electricity, the impact on water consumption is a critical resolution to one of the AI industry's biggest PR and operational liabilities. Traditional cooling tower systems are notoriously thirsty, consuming roughly 2.6 million gallons of water per megawatt annually. The Rubin generation's closed-loop liquid cooling effectively reduces water consumption to near zero, achieving up to a 100% reduction in water usage. This transforms the economic geography of AI, allowing companies to build high-density factories in water-stressed regions where traditional data centers would be environmentally or legally impossible to sustain.

Physical density also sees a radical improvement. In traditional setups, the cooling infrastructure often occupied 6U of rack space. The Rubin design shrinks this footprint to 2U. By using only one-third of the previous space to achieve superior cooling performance, operators can pack more GPU servers into the same physical footprint. For urban data centers in space-constrained hubs like Seoul's Sangam or Gasan districts, this 2U reduction is not just a marginal gain; it is a decisive factor that allows for the expansion of compute power without the need for costly real estate acquisition or building expansion.

For operators implementing this in diverse climates, the strategy shifts to a data-driven analysis of local ambient temperatures. In regions with distinct seasonal shifts, the goal is to maximize the number of days the facility can run in chiller-less mode. This involves calculating the capacity of dry coolers based on annual temperature peaks and determining the exact point where the thermal density of the chips exceeds the physical capacity of air cooling. When the wattage per chip reaches the limit where heatsinks become too bulky for the rack or fan noise exceeds safety thresholds, the transition to liquid cooling becomes a technical necessity rather than an optimization choice. This transition is often managed through specialized partnerships, such as those with Motivair, to ensure the infrastructure can handle the extreme power densities of the next generation of AI.

Data centers are no longer required to be freezing, noisy vaults. By redefining the thermal threshold, NVIDIA has shifted the objective of cooling from simple temperature suppression to a sophisticated exercise in business efficiency and resource conservation.

NVIDIA Rubin Aims for Zero Water Use With 45C Liquid Cooling

The Architecture of the 100% Liquid-Cooled Factory

The Economic Pivot from Chillers to Dry Coolers

Related Articles