IT engineer working in a data center

TE Perspective

Power, Speed, Cooling: The AI Data Center Challenge

For artificial intelligence to be consistently generating revenue, data centers need to get faster. A lot faster. AI models improve with each new generation of training, but that training still takes a relatively long time. The delay comes from current limitations on the amount of data that can be pushed through the graphic processing units (GPUs) that train AI models. The race to achieve higher speeds is already underway. As it unfolds, the industry will have to do more than just upgrade its infrastructure to handle more data more quickly. It will also have to manage its increasing demand for enough power to make these complex computations faster, as well as the heat that new infrastructure generates.

 

To make matters more complicated, system architects essentially have to build the plane while they’re flying it, working with equipment and component manufacturers to maximize the performance of today’s infrastructure while simultaneously preparing to upgrade and scale for the even higher speeds that will soon be required. 

Speed is the key to making AI profitable

AI end-users may get answers in seconds, but training state-of-the-art models takes time – generally two to four months for large foundation models.  That lag limits how quickly organizations can translate fresh data into improved models and business value. Shortening the train-to-deploy loop is more than just a technical advantage—it’s an economic imperative.

 

Consider a new automotive assembly line. Production systems generate rich datasets on throughput, quality and efficiency starting on day one. In theory, that data could be used immediately to optimize operations. In practice, however, the length of the AI training cycle means manufacturers have to wait months before retrained models can deliver actionable improvements.

 

With faster algorithm training, organizations could put AI-optimized processes in place much faster, applying efficiency improvements and unlocking cost savings. This is just one example of how the pace of innovation inside data center racks can produce wide-reaching impacts.

Digital transformation concept, High speed

Higher speeds will eventually run through fiberoptic cable

Higher speeds are driving changes to data center architectures. Today, 800 gigabit-per-second modules have become widely available, with 1.6 terabit-per-second modules coming soon. The arrival of those higher-speed connections has reduced the maximum practical length of copper cable, pushing more traffic onto optical cables. This transition has raised interest in lower-power architecture solutions like linear pluggable optics, which are available now, and co-packaged optics, which are still evolving. Inside the rack, architectural changes include aggregating GPUs to reduce communication bottlenecks during training and large-batch inference operations.

 

The timing for any given data center to make one or more of these architectural adjustments depends on how well its existing infrastructure can continue to meet customer needs. Partnering with component manufacturers to plan for these infrastructure shifts can help data centers upgrade their infrastructure more efficiently. In the meantime, upgrading to modular, input/output-agnostic baseplates can give data centers a jump on their preparation for that changeover. Managing the switchover properly can also help ensure they keep the same fiber and baseplates as they continue to upgrade module types in the future, allowing them to continue to increase speed without having to redesign and replace the entire server chassis. 

Scaling up (within the pod) and out (across pods)

Scaling for speed and volume doesn’t just take place across connections. It can also take place both within a server pod and across pods. To scale up within a pod and make it faster and more powerful, data centers need uniform, deterministic fabrics, expanded memory capacity and predictable latency across all the elements in the pod. Components that support those capabilities also support scale.

 

Similarly, scaling out operations across pods to support workloads more dynamically requires a more elastic pod design, where networking, power and cooling can grow or shrink as workloads require, optimizing power consumption. That capability will rely on hot-swappable power, instrumented cooling manifolds, blind-mate interconnects and instrumented backplanes. It will also require telemetry at every layer to monitor IT loads, as well as standardized components that can handle the changes needed for such dynamic configurations.

Power and cooling are now first-order design inputs

These upgrades require increased power and throw off more heat. Dissipating that heat increases the requirement for power in its own right. Today, TE Connectivity is working closely with companies throughout the industry to innovate and safely deliver increasing amounts of power to the existing server racks while also establishing a path toward future upgrades as data centers—particularly those that provide cloud hyperscale services—continue to develop new standards and architectures that will support higher-capacity racks. Average fleet rack densities are still mainly 10 to 30 kilowatts today, but AI training clusters are already pushing the outer edge of specifications at 120 to 132 kilowatts. By 2027, AI racks are expected to reach up to 600 kilowatts, with one-megawatt-class racks appearing in select deployments by the end of the decade. Meeting these higher densities requires high-voltage DC architectures to reduce current, copper mass and distribution losses while supporting a very wide range of rack power.

 

Thermally, traditional air alone won’t suffice at these loads. Direct-to-chip liquid cooling is rapidly becoming standard for high-TDP accelerators, with two-phase immersion and hybrid solutions used in select cases. Components like optics-ready interfaces that support next-generation liquid-cooling solutions will help keep temperatures under control.

Designing for optionality: build the envelope once, iterate inside it

The most capital-efficient strategy is to “lock the envelope” early:  define sensible, shared parameters for mechanics, power, cooling, input/output, and telemetry at the rack/pod boundary—and let silicon and software iterate inside that envelope without forcing a chassis redesign. The sooner data centers prepare their infrastructure for this higher-power, higher-bandwidth future, the better they will position themselves to support the AI industry’s need for continued advancement. Ideally, data centers need a serviceable, instrumented rack interface they can scale, monitor and maintain without completely renovating their campus — and if the industry works together, this transition can be more efficient.

TE Connectivity collaborates with hyperscalers, OEMs and integrators to engineer the mechanical, power, cooling and high-speed input/output elements that form this stable envelope—blind-mate, touch-safe power networks; liquid-cooling interfaces; and optics-ready interconnects—so customers can iterate compute rapidly without tearing out the rack.

 

The industry’s mission is clear: close the AI speed gap intelligently, with infrastructure that’s flexible, scalable and ready for the next wave of innovation.

About the author

Sajjad Ahmed

Sajjad Ahmed is the Director of R&D and Engineering for the Digital Data Network business, where he leads the Advanced Engineering and Solutions team in developing next-generation interconnect architectures in close partnership with global customers. With over two decades of experience in architecting and scaling engineering ecosystems, he combines a deep understanding of multidisciplinary engineering with a proven ability to overcome mass-production challenges. Sajjad has consistently introduced innovations that have advanced the compute industry and continues to drive technologies that shape the future of data centers worldwide.