Extreme Connect 2025
May 19-22
Join us in Paris, France.
Register Now
Volume 3: Achieving Any-to-Any Connectivity with Time-tested Design Approaches and Proven Methodologies described 3-Stage and 5-Stage Clos architectures and how they support any-to-any connectivity. This document describes the arrangement of the elements in the architecture and how it relates to reliability, leveraging the availability calculation method that would typically be used and how various equipment configurations may be compared.
As mentioned previously, one of the top priorities of the Office of Management Budget (OMB) is to maintain high availability of services; which requires a reliable network infrastructure as its foundation. It may seem obvious, but reliability in the data center is of greater importance today for a few reasons.
• The amount of data traffic being processed in the data
center, I.e. server traffic, is doubling every 12 to 15 months
• The amount of traffic flowing to and from the data
center compute, storage, security, networking and
application based in federal architectures constitutes
nearly all mission critical network traffic
The importance of reliability and availability become more critical with each passing monthly traffic report. Another reason that the actual hardware being deployed must be assembled into a resilient design, is that the protocols being used to deploy the data center use policy, not metrics, and peering, not link-state protocols to preserve multiport connections. Therefore, the age-old consideration of 1RU/2RU form factors versus chassis-based equipment presents itself.
One of the benefits of the IP Fabric deployed in a Clos based architecture is that it provides efficiency in establishing any-to-any connections. As a networking connection system, versus the switching element within a product design (original evaluation by Clos 1953) provides the higher network system availability; this a direct result from the overlay and underlay working together to keep services running for the subscriber community of the various applications hosted in the Data Center PoD (Point of Delivery). Because of utilizing a single form-factor based infrastructure, deployed in a framework that reflects a crossbar switching platform (like modern chassis-based packet switching and routing systems), the IP Fabric underlay provides the basis for a system availability that, model dependent, exceeds several nines availability (>5).
One standard that assists planners in selecting high availability elements for their design is the manufacturer calculated MTBF (mean-time between failure). To determine the element availability and the system availability the MTBF of the individual devices is used to and calculate the network availability. A risk assumption of 8 hours MTTR (mean time to restoral) was added. To determine system availability, we utilized the element availability rate of the device and used it to calculate system availability using the parallel availability method using the model of a 4 leaf (A), 4 spine system to a 4 Leaf system (B) as the baseline. This is a calculation based upon the element MTBF, the organizations MTTR (mean-time to restoral) targets, to arrive at the availability (A).
While Clos architecture principles were originally applied
to the transistor and switching chipset technology of the
time, the model works well for a system architecture, if the
individual element MTBF/Availability numbers are known.
Expressed below is the availability calculation using the
typical leaf switch element of a single form factor 1RU (rack
unit) system using the parallel availability calculation
Thus, the Leaf (Access to the fabric) availability, (network
availability) to the server is 99.9999999999985% or 13
nines. The Spine calculation would be performed in the
same manner as a Leaf system. As shown below, the
availability of the Leaf-Spine-Leaf fabric architecture utilizes
a serial calculation where the unavailability is added L1-4u
(Leaf 1 through 4 unavailability) to S1-4u (Spine elements
1-4 unavailability) to L5-L8u (3rd stage of the traffic flow in
a server to server model Leaf 5-8 unavailability) to arrive at
the system availability from server to server for East West
traffic. In a Typical Data Center, the traffic flowing EastWest accounts for up to 80% of the traffic across the fabric.
The resulting availability between the elements of the 3
stage Clos based underlay of the IP fabric F1 (Fabric 1) or
PoD is 99.9999999999984 or 13 nines. This amounts to
an expected 5.0767390291639500000000e-07 seconds
unavailability per year, or could be also expressed as
5.0767395 nanoseconds/yr. This equates to slightly > one
half of one billionth of a single second per year.
A three stage Clos configuration yields very high availability
as a direct result of the architecture used. In addition, we
maintain the benefit of the 3-stage Clos maintaining anyto-any capability, while delivering ultra-high availability
(>5 nines, in this case 13 nines). Systems with lower MTBF
values may yield lower system availability overall, but what
may be the question to ask is: Does this design create an
unacceptable level of unavailability?
A chassis-based infrastructure yields similar results, as we view network availability to a set of servers to a fully redundant SLX 9850 chassis-based system. The chassisbased hardware, when configured with the appropriate level of N+1 redundancy componentry. When calculating the individual common equipment within the system, we find that these elemental blocks such as Switch Fabric (S1,S4,S3,S4) Power Supplies (P1, P2, P3, P4,), Fan assemblies (F1-F6) can be calculated as mathematically 100% availability of the system. Due to such high levels of redundancy, the chassis common equipment number for unavailability can be lower that that <1/googol of a second (128 places). This is achieved because of extensive hardware redundancy that consists of subcomponents working in N:1 or N+1 redundant configuration. For example, the switching fabric modules measures 8 nines availability. To arrive at the load sharing availability of all installed switch fabrics (Sa) supporting parallel paths from the line cards, we multiply the unavailability of switch module S1-S6. The difference between single form factor and the chassis-based solution, when implemented in a PoD yields similar results.
The general conclusions can be made that the chassisbased infrastructure buys an additional ½ of a nanosecond availability when compared to the single form factor implementation. The exercise also demonstrates that the cost of embedding more than 3 access links (LAG) or Leaf to Spine connections achieve connectivity assurance, but do not yield significantly higher availability. In this model the 3 stages at server to Line card (L1-L4) with parallel connections to switch fabrics 1-6 (S1-S6) (stage 2) back to the egress line card to the server port (3rd stage).
To conclude, when costs and ease of expansion are factored into the decision, a single form factor implementation may be less expensive to acquire, deploy and operate and better facilitate a ‘pay-as-you-go’ cost modeling. While traditionally in the past, the importance of the traffic, the amount of growth and proven higher availability of chassis systems dictated their usage, the overall system configuration (3/5 stage Clos based PoDs with 1RU/2RU elements) negates the added benefit vs. cost consideration typically incurred by the selection of the chassis solutions.
There is, however, one inescapable benefit of utilizing the chassis-based system as an element within an IP fabric data center. Which is that it enables higher scale at the access layer while reducing the number of Spines. This should be weighed against the impact of a failure of one of the Spine switches, versus a single form factor-based Spine switch (Risk, Mean Time to Recovery). In a similar configuration with a 1RU based single form factor spine consisting of 8 units with 8 x 100G links from Leaf to Spine, if one unit fails, only 12.5% of the overall bandwidth is lost. When a chassis based system fails as a Leaf, 100% of the access bandwidth is gone, or if deployed as a spine (with 8 x 100G links from the Leaf layer to 2 units of 4 slot SLX 9850’s) 50% of the bandwidth is lost.
Therefore, the general question to ultimately ask the technical director or program manager is:
• Does the Data Center planner want to risk 50% of the
Data Center PoD bandwidth to a system that has a < ½
of a nanosecond of unavailability per year risk?
• Or does the Data Center planner want to risk 12.5% of
the data center PoD bandwidth to a system with a
slightly > ½ nanosecond of calculated unavailability
per year.
Where the decision may have been clear in previous decades that the risk involved with choosing a chassis system or not choosing a chassis may have appeared clear, today’s choice involves splitting mere nanoseconds.
The resulting network availability calculations for each model of SLX Switches in the Federal portfolio are provided in the chart below. In this chart we have provided the availability/unavailability for a 1,2 and 4 port access port channel deployment. We have indicated the number of ‘nines’ that the availability level achieves with an 8 hour mean time to restoral (MTTR). The availability is calculated and expressed as a percentage. As you will note from the chart below, The unavailability is expressed in seconds (1 unit) milliseconds (2 ports, 2 units), micro-seconds (not shown- 3 ports with 3 separate units) and nanoseconds (unit of measurement needed for 4 ports over 4 access/leaf switches). As you will note that the availability calculation for 4 ports connected to 4 separate leaf switches is 100% (<1 Nano-second to 10’s or hundredths of pico-seconds.)
The chart determines the calculated availability based upon the MTBF (Mean Time Between Failure) and the MTTI (Mean-Time-To-Innocence). For example, a single Leaf access port for a server connected to an SLX-9150 would be provided 4 nines or 99.99823% availability; however, with a 4-port access configuration where the access links connect to 4 separate Leaf switches. Over 9 nines of availability are calculated. This represents the access tier.
When the same SLX-Switch models are utilized for the Spine, with multiple connections, the same calculation for that tier of the PoD is performed and the Spine availability then known. After these calculations are done, a series calculation is performed to provide the availability of any to any connections within the data center.
Leaf to Spine Series Unavailability = the unavailability of 2 port access (L1/L2 parallel unavailability) + Unavailability of 2 spine connections (S1/S2 parallel unavailability).
To arrive at the L1/L2 unavailability and the S1/S2 unavailability for a 2-port access/Leaf connection traversing a 2-spine configuration in a 3 stage Clos architecture the unavailability of each tier in the model is additive, and then subtracted from the availability.
Thus, the access tier (2 switches L1 and L2) unavailability of an SLX-9150 is 0.000000000314724749117937 (year). And the Spine tier (If also constructed with SLX 9150, S1 and S2) would be 0.000000000314724749117937 (year). Expressed a formula: (Leaf Layer Unavailability) + (Spine Layer Unavailability) = Fabric Unavailability (year)
In this document, industry standard definitions for element availability were used for comparison purposes. A uniform standard means to define the network level availability by tiers in the data center was identified. The use of single form factor switching elements was compared with chassisbased systems. Various elements provided a means of comparison when deployed in series or parallel and the resulting industry standard availability measurement when deployed in a Point of Delivery (PoD), or data center fabric system. Upon reviewing the total availability levels of the fabric, the underlay can now be programed onto the hardware. From this point, the element software maintains reachability for the rock-solid foundation built from these elements that are configured to deliver the any to any connectivity discussed in the previous document. Continue to Volume 4: The Data Center IP Fabric Control Plane This document discusses the protocols utilized to handle the IP control plane of the data center fabric. It also discusses the ability to provide deployment simplicity and uniformity with tools that create an automated underlay