Extreme Connect 2025
May 19-22
As demand for AI continues to surge, data centers are being re-engineered to meet the unique networking and computational challenges these workloads create. The future of the AI cloud will rely on standards-based Ethernet to deliver the scale, performance, and flexibility required for next-generation workloads and applications.
Traditional data centers are undergoing a paradigm shift. Where once AI was an isolated function, today the expectation is that AI will be integrated everywhere — across the edge, core, and cloud, from personal devices to enterprise systems.
To keep pace, organizations must accelerate their AI infrastructure to support workloads anytime, anywhere, and at any scale. This requires purpose-built AI data center fabrics designed for high performance, massive scalability, and lossless operation.
AI and machine learning (ML) workloads span a wide range of tasks, from analyzing and interpreting data to generating predictions and automating decisions. These workloads sit at the heart of today’s most advanced technologies but place unprecedented demands on data center infrastructure, especially in terms of high-speed networking, storage, and compute.
Training large language models (LLMs), for example, requires massive, centralized datasets and sustained high-bandwidth connections. Meanwhile, the rise of inference workloads is driving a shift toward more distributed architectures, where traffic flows between devices and across the edge, core, and cloud.
Traditional data center traffic is primarily asynchronous — think of database calls or users making occasional requests to a web server. In contrast, AI workloads generate what’s known as “elephant flows”: massive, sustained streams of data moving east-west across the data center between machines. Very little of this traffic leaves the data center (north-south), with up to 90% circulating internally in machine-to-machine communication.
Within an AI cluster, the bulk of data is passed between GPUs over extended periods. Unlike traditional networks, where tasks can proceed in parallel, GPU clusters depend on having all the necessary data in place before moving forward. A delay or bottleneck affecting even a single GPU can trigger a cascading slowdown, making overall job completion time (JCT) critically dependent on the slowest path in the system. This makes the network a central performance factor that demands careful, specialized design to meet the unique needs of AI workloads.
Many enterprises are exploring running AI workloads on-premises rather than in the cloud, driven by concerns over data privacy, regulatory compliance, security, latency, and rising cloud bandwidth costs.
An AI-enabled data center typically consists of three core components: the front-end and back-end networks, the storage systems, and the compute clusters. The size and configuration of an AI/ML cluster depend on several factors, including the models’ complexity, the datasets’ size, and the desired training or inference speed. These clusters can range from small, enterprise-scale deployments to massive hyperscale environments with thousands of compute, storage, and networking nodes.
The diagram shows the layered architecture of an AI data center, with separate leaf-and-spine (CLOS) fabrics for the front-end and back-end networks. The front-end fabric connects CPUs and external user traffic, while the back-end fabric interconnects GPUs using RoCEv2 NICs for high-speed, lossless communication. This dual-fabric approach ensures that storage and compute workloads remain isolated and efficiently scaled to meet AI demands.
The front-end network handles external connections to the AI cluster — managing tasks like orchestration, handling API calls for inference, and collecting telemetry data. Importantly, the front-end generally sees far less traffic than the back-end network, which supports the intensive data flows required for model training and storage operations.
Within the back-end, the design splits into two critical segments:
In both cases, a fundamental design principle applies: avoid oversubscription. The links connecting storage and compute nodes to the network leaf switches must be provisioned with sufficient capacity to ensure that no component becomes a bottleneck. As cluster size grows, maintaining the right port density, bandwidth, and architecture becomes essential to preserving efficiency and performance.
Cloud computing promises boundless scalability, flexibility, and efficiency. But with the surge of generative AI, many enterprises are waking up to the steep costs of running intensive workloads in the cloud, especially as GPU prices climb. As a result, organizations are increasingly reconsidering whether some AI workloads should shift back to on-premises data centers, where they can better control expenses and performance.
Beyond cost, on-premises deployments offer significant advantages for data security and governance. AI and ML models typically rely on vast, often sensitive datasets. While cloud providers offer robust security, many enterprises prefer the tighter access controls and reduced risk of exposure that come with keeping data inside their own security domain.
At the heart of these high-performance on-premises architectures is Ethernet, the ubiquitous networking technology that has powered campuses and data centers for more than five decades. Today, Ethernet is reinventing itself to meet the unique demands of AI workloads, delivering lossless, high-speed capabilities at 400, 800, and even 1.6 terabits per second.
One of the challenges for achieving high-performance networking for AI workloads is the limitation of traditional TCP/IP stacks at such high speeds, due to high CPU overhead. Remote Direct Memory Access (RDMA) offers a solution to address this challenge. By offloading transport communication tasks from the CPU to specialized hardware, it provides direct memory access for applications, dramatically increasing performance.
Specifically, RDMA over Converged Ethernet (RoCE), combined with techniques like Data Center Quantized Congestion Notification (DCQCN), Priority Flow Control (PFC), Explicit Congestion Notification (ECN), and dynamic load balancing, creates a lossless Ethernet fabric purpose-built for AI.
While InfiniBand has long been the gold standard for high-performance computing due to its low latency and efficiency, RoCE offers compelling advantages: it integrates more easily into existing Ethernet environments and typically comes at a lower cost, making it a strong choice for AI data centers.
Recognizing the limitations of RoCE, a group of vendors and operators have formed the Ultra Ethernet Consortium (UEC) to tackle the next generation of challenges. With growing concern that traditional network interconnects cannot provide the required performance, scale, and bandwidth to keep up with AI demands, the UEC is working to extend and enhance the proven Ethernet standard. Their goal is to overcome the bottlenecks that arise when exchanging massive volumes of data between compute nodes over Ethernet-based clusters, such as in AI workloads.
By adding new capabilities and features to the known and proven Ethernet technology specification, Ultra Ethernet aims to solve some of the challenges posed by current Ethernet technology used in the AI and HPC data center clusters, by coming up with ways of solving the problem of exchanging data between computes in a cluster connected over an Ethernet network.
Ethernet is without question a ubiquitous technology, used as the backbone of data center networks, whether for traditional or AI workloads. With the rise of open-source AI models such as DeepSeek, we will likely see resurgent growth in on-premises enterprise data centers.
Since these models can run on less powerful, more affordable compute infrastructure, making on-premises AI not only feasible, but also attractive. This would allow organizations to avoid the recurring cost of running workloads in the cloud, while still achieving high performance.
Running AI workloads on-premises offers tighter control for enterprises with sensitive data or strict requirements around compliance with privacy regulations and data sovereignty.
As AI models continue to be more lightweight and efficient, we can expect a shift towards edge computing with small deployments closer to end users. This reduces latency, cuts cloud bandwidth costs, and brings AI-driven services and applications closer to the action.
Looking ahead, the convergence of high-performance Ethernet, innovative AI models, and evolving enterprise needs is reshaping the datacenter landscape. Organizations that act now to modernize their infrastructure by blending proven Ethernet technologies with next-generation fabrics like Ultra Ethernet will be best positioned to harness the full potential of AI, turning technical capability into real competitive advantage.