A while ago I was providing some insight in one of my blogs around DCI and its requirements. Within the network fabric specifically one should focus on the following challenges that need to be addressed as part of a DCI solution
- VM mobility introduces challenges on how to optimize east/west and north/south traffic at L2 and L3
- Larger Layer 2 domains introduce topology challenges (i.e. how to handle more than 2 DC, degree of meshing with virtual switching), availability and resiliency challenge (unknown and broadcast flooding) and scale (no. of switches)
– Therefore Broadcast, Multicast and Unicast flooding must be controlled throughout the whole fabric
- Capacity constraints while workload and data is migrated between and within data centers
– Application traffic visibility and control (Quality of Service) required inside and between data centers
At Enterasys we focus on simplicity – this is a key architectural design goal of our OneFabric architecture.
Lets take east/west traffic optimization for example – we invented fabric routing to address this. It is a VRRP enhancement that is fully interopable with any VRRP router out there but it provides optimized east/west traffic routing:
With traditional VRRP, where one has a single active Layer 3 default gateway, these designs are inefficient for load distribution and also increase the latency in the network.
Fabric Routing is a mechanism to provide distributed routing integrated in SPB and RSTP/MSTP/LACP based switch/routers to address the need for maximum throughput, lowest latency and optimized traffic flows inside the data center fabric. In the context of both the LAN and the WAN, north-south traffic is the client server traffic that goes between users in a branch office and the data center that hosts the application that they are accessing. In the context of the data center, east-west traffic (which makes up almost 80% of the traffic in the DC) is the traffic that goes between servers in a given data center fabric. Fabric Routing is a unique innovation by Enterasys built upon and interoperable with VRRP so administrators can leverage their existing knowledge for the implementation. It can be also applied and provides value in the campus LAN which is a typical value proposition of the OneFabric architecture from Enterasys.
Figure 1 – Traffic flows without Fabric Routing
In the figure above, the traffic between the servers in VLAN1 and VLAN2 are routed by the VRRP master of each VLAN/Subnet which sits at the edge of the fabric. That is a typical deployment, as if the servers are virtualized and are moving throughout the fabric, there is no way to determine the optimum path or place for the routers – so they are attached somewhere at the edge. This results in in a 3x increase in latency (6 vs. 2 hops), an unnecessary bandwidth increase at 5 additional links in the fabric, as well as a limit of the aggregated routing performance between the 2 VLANs in the fabric to a single link in this specific example.
The benefits of introducing Fabric Routing by Enterasys are:
- Optimized traffic flows
– No need for “hairpin mode” routers at the fabric edge
– Routed east-west traffic always takes the shortest path regardless of VRRP master/backup router states
- Minimized latency for routed traffic
– The nearest hop fabric enabled switch/router always routes the traffic so the number of hops for routed traffic is equal to the number of hops for switched traffic in the fabric
- Maximized aggregated routing throughput
– As every router becomes active the aggregated routing performance in the fabric is equal to the switching performance
Figure 2 – Traffic flows with Fabric Routing
In a Fabric Routing enabled domain, the traffic is routed by the first switch/router directly to the destination regardless of the VRRP state it is in. This creates a distributed, nearest hop routing within the fabric that optimizes throughput, latency and traffic flows (minimizing traffic load through the fabric).
The latest addition of Fabric Routing – with IP mobility also addresses the challenges of optimized north/south ingress traffic to the data center. Fabric Routing with IP mobility uses host routing techniques to dynamically distribute and inject host routes from the data center switch (that has fabric routing enabled) that a VM is closest connected to – and remove them from the previous closest fabric routing switch. The IETF discusses those concepts in various RFCs, for example “Virtual Subnet: A Host Route based subnet Extension Solution- draft-xu-virtual-subnet-07“. In general the workload mobility (which is only desireable when you solve all associated problems like storage etc as pointed out in my previous blog) in a DCI scenario or even within a data center can result in non-optimized client/server (north/south) traffic patterns. When a VM moves then the external route to the subnet that the VM belongs does not change, this results in non-optimal ingress, inbound traffic to the the data center. The solution now provides efficient routing inbound to the data center regardless of where the VM is currently located: the route to the VM is always distributed as a host route from the closest – fabric routing enabled – switch. In a first release a Layer 2 interconnect is necessary between the data centers, subsequently we plan to enhance the solution so this will not be required anymore and host route injection does not affect the topology anymore.
Our simplified DCI solution includes overall the following components:
- DCI at Layer 2, leveraging existing standards for transport
– Use GRE/L2 as a pure transport (support across any service provider service as it is just an IP tunnel)
– Use SPB (IEEE 802.1aq Shortest Path Bridging) or LACP (Link Aggregation 802.3ad) via VSB (Virtual Switch Bonding) through the GRE/L2 tunnel to eliminate learning or topology constraints that would be introduced with the GRE/L2 tunneling alone
- Optimize Traffic Routing, standard VRRP enhancements
– Use fabric routing for east/west traffic optimization
– Use fabric routing with IP mobility for north/south traffic optimization for a pure DCI Layer 3 in the future
- Flood Control Policies, part of our policy framework
– Separate policies per port to control bcast, mcast, unknown unicast flooding at each layer of the fabric – inside the fabric and between data centers
- Policies for application QoS, part of our policy framework
– Deployment of dynamic bandwidth and QoS policies to each application and system in the fabric