Extreme Connect Seats Are Filling Fast. Don’t Miss Out.
Register NowData centers are at an inflection point. Requirements are changing faster than at any point in the last 20 years, and the next 5–10 years will bring more fundamental change than the previous two decades combined.
The driver of all this change is AI. The data center of the future will look very different from what we have been building and operating up until now. Understanding why requires us to look at what is actually changing under the hood.
To understand what’s going on, it helps to look at how data center traffic patterns have evolved. This isn’t the first time the industry has had to adapt to changing requirements.
Traditional data centers were “cloud era designs” built for web apps, microservices, databases, and storage-heavy workloads. This meant CPU-heavy racks, 10–25Gbps Ethernet, modest or no GPU presence, and air-cooled servers typically consuming 5–15kW per rack.
Traffic was predominantly North-South, users and the internet knocking on the front door, responses going back out. Traffic was bursty by nature, latency didn’t really matter that much, throughput per flow was quite modest, and some oversubscription was fine because not everybody was talking at the same time. This was more or less how data centers were optimized for 20+ years.
The rise of micro-services and distributed systems forced a significant rethink. Traffic became predominantly East-West, server-to-server, VM-to-VM, container-to-container, all inside the data center. The front door became less important than what was happening inside the building. Traffic was a mix of relatively small messages, some latency tolerance was accepted, and some packet loss was manageable. The industry adapted but it turns out this was just the warm-up.
AI blows up the scale and latency requirements entirely. Training and inference workloads span multiple servers and accelerators across multiple racks, and sometimes multiple data centers. Instead of sending a request and getting a response, you are streaming tens or even hundreds of GBs per second, continuously, for minutes or even hours.
The mechanism driving this is collective communications operations like all-reduce, all-gather, and broadcast, where every GPU must exchange data with many others in a tightly synchronized way. With 1024 GPUs doing all-reduce at every training step, gradients are being pushed across the entire data center fabric thousands of times per job. If one GPU slows down, everyone needs to wait, and that directly impacts Job Completion Time (JCT).
One way to think about it is that if the traditional data center was lots of people knocking on a building’s front door, the AI data center is thousands of people inside the building all trying to move a piano to each other at the exact same time. The network has to adapt.
We’ve already established that if one GPU slows down, everyone waits, but let’s follow that thread a little further, because the consequences are more significant than they might first appear.
Assume a few dropped packets. This causes a small queue buildup, which creates a slow connection between GPUs, which causes stalls across the entire job. The training time increases. And because you are running hundreds or thousands of expensive accelerators, every extra minute of training time is direct measurable cost. The network is no longer just infrastructure, it is in the critical path of your capital efficiency.
This is why the formula is simple, dropped packets = stalled GPUs = wasted money. So, the network must provide low latency, low jitter, and it needs to be predictable and almost lossless. Best-effort networking, which was perfectly acceptable in the cloud era, is no longer sufficient.
The requirements that follow from all of this are unambiguous.
First, oversubscription. Traditional data centers ran comfortably at 3:1 or even 5:1 oversubscription ratios, which was fine because not everyone was talking at the same time. AI workloads eliminate that assumption entirely. Every GPU is talking to every other GPU, continuously, in lockstep. The requirement is 1:1, no oversubscription.
Second, bandwidth. The scale of AI elephant flows — where you are streaming hundreds of GBs per second across the fabric for hours at a time — is pushing the industry through successive bandwidth generations. 400G is becoming the baseline, 800G is arriving, and 1.6T is on the horizon.
Third, as we discussed above, the fabric must be lossless to avoid direct economic impact. This is why RoCE (RDMA over Converged Ethernet) is becoming a minimum requirement rather than an advanced option. Low latency, low jitter, predictable and almost lossless: those are not nice-to-haves in an AI fabric, they are the foundation everything else is built on.
Now that we know the requirements, what does the architecture actually look like?
The fabric itself needs to be lossless end-to-end, which means advanced congestion control is not optional. The two key mechanisms here are ECN (Explicit Congestion Notification) and PFC (Priority Flow Control), both of which work together to keep queues shallow and flows moving without drops. The Ethernet vs. InfiniBand debate is ongoing, but regardless of which wins, the requirements are converging. The old term “DC network” is being replaced by “AI fabric” for good reason, it signals that the network is no longer neutral transport.
On topology, spine/leaf (CLOS) remains dominant, but variations are emerging. Rail-optimized topologies in particular are gaining traction because they align the physical fabric more closely with how collective communications actually flow between GPU groups.
The bigger architectural shift is in the center of gravity. Today, the CPU is the primary processor and the GPU is the accelerator. In the AI data center that relationship inverts, the GPU becomes the primary compute unit and the CPU becomes the coordinator. The network follows that shift, it gets optimized for accelerators, not CPUs, with tight coupling between NICs, switches, and the collective communications libraries like NCCL and RCCL that orchestrate GPU-to-GPU traffic.
The physical server itself is also being rethought. Instead of fixed servers where compute, memory, and storage are bundled together, the AI data center moves towards a disaggregated, composable architecture: separate pools of compute, GPUs, memory, and storage, all connected by high speed fabrics and assembled dynamically for each workload. This is only possible because the fabric is fast and lossless enough to make the distance between resources essentially invisible. The network, again, is what makes it work.
This is also where SmartNICs and DPUs (Data Processing Units) come in, offloading collectives, telemetry, security, and scheduling away from the CPU and closer to the fabric itself.
The architectural shifts described above have direct physical consequences, and the data center of the future will look very different from what we operate today.
The most visible change is cooling. Air cooling simply cannot handle 50–150kW per rack, which is where dense GPU clusters land. Direct-to-chip liquid cooling, immersion cooling tanks, and rear-door heat exchangers are becoming standard rather than exotic. The power demands are so significant that datacenter locations are shifting too, towards nuclear plants, hydro dams, and renewable farms, because the energy requirements are that serious.
On the networking side, copper does not scale forever, and the industry knows it. The network is becoming optical everywhere. Co-Packaged Optics (CPOs) bring optical links closer to the GPUs themselves, reducing electrical hops, which means lower power consumption and lower latency at the same time. Power efficiency, in other words, is becoming as important as bandwidth. The two are now deeply linked.
So, what does this actually look like physically? Today it is rows of 1U/2U servers, Ethernet switches, and loud fans. The future is dense GPU blades, thick liquid pipes, optical-interconnect backplanes, and fewer but massively more powerful racks. Data centers may become energy plants that happen to compute.
Training will stay centralized, concentrated in massive data centers that can support the scale and power demands. Inference is a different story, it is already exploding outward, moving closer to users, devices, and data. That is what drives everything in the next section.
So far we have been talking about the data center. But AI is not staying confined to the data center. The edge is about to see an explosion of AI inference workloads, pushed out to factories, vehicles, retail, telecom towers, and smaller regional data centers everywhere. And the edge is not one thing, it is a spectrum, regional mini-data centers, telco central offices, campuses, factories, and eventually individual devices. Each point on that spectrum has different constraints, but the direction of travel is the same, intelligence moves closer to where it is needed.
At the edge, latency requirements get significantly tighter. The 5–50ms tolerance that was acceptable for many cloud workloads is no longer good enough for AR/VR, autonomous systems, and industrial control. The new requirements are deterministic latency, local traffic breakouts, and predictable jitter. Best-effort, once again, does not cut it.
Edge networks will need to become autonomous and self-optimizing in ways that centralized data center networks do not. Expect AI-driven traffic engineering, real-time congestion prediction, and dynamic model placement, where models themselves are moved to where they are needed, not just the traffic they generate. The network starts making decisions like “this workload should run here, not there.”
This is where things get genuinely interesting. The connection between data centers and edge nodes will be carrying a new kind of traffic: model updates, LoRA adapters, fine-tuned deltas, and telemetry feedback loops. Model movement becomes a first-class traffic type, alongside user traffic and storage replication. The network gets optimized for the model lifecycle, not just packets.
The WAN connecting all of this also needs to evolve. The AI-aware WAN of the future will understand inference deadlines, prioritize model shards, and route based on compute availability rather than just link cost. The mental model shifts from “shortest path” to “shortest path to available intelligence.”
Edge hardware converges too, with compact accelerators, NICs with embedded AI, and secure enclaves for model protection. Increasingly, a single edge box will blend compute, storage, networking, and security together.
It would be easy to look at all of the above and conclude that the industry just needs faster networking. That misses the point. What is fundamentally different about the AI era is not the speeds and feeds, it is the role of the network itself. There are three shifts worth naming.
The network is in the critical path. It is no longer just transport. In the AI data center, network performance directly impacts model performance, training time, and therefore cost. That is a completely different relationship between the network and the workload than anything the industry has designed for before.
Networking becomes adaptive and learned. Static configurations and best-effort delivery do not fulfill the requirements anymore. The network of the future will be closed-loop controlled, continuously self-optimizing based on what the workloads actually need in real time.
Compute and networking must be co-designed. You cannot design one without the other anymore. The tight coupling between GPUs, NICs, switches, and collective communications libraries means that the vendors who control both the compute and the network layers will move significantly faster than those who do not. This is already reshaping the competitive landscape.
Pull it all together and the picture becomes clear. The data center of the future is a GPU-scale, lossless, optical-first fabric built for collective communication. The edge is a latency-obsessed, AI-managed mesh where intelligence moves dynamically to where it is needed most. And the WAN connecting them understands inference deadlines, not just link costs.
Looking further out, the next 10–20 years will likely bring changes that make even today’s AI data centers look transitional. Photonic interconnects will replace electrical signaling deeper into the fabric. Memory-centric computing will shift how workloads are structured at a fundamental level. On-chip optical networking, AI-designed data center layouts, and autonomous power optimization are all on the horizon, moving at different speeds but all pointing in the same direction.
The industry has been here before, standing at the edge of a shift that seems large until the next one makes it look modest. The difference this time is the pace. The piano is not getting any lighter.