Compute Edge to Cloud for Smart Infrastructure

Ed Koehler Distinguished Principal Engineer Published 7 Sep 2021

This the third in a series of blogs dedicated to discussing the future of smart communities and infrastructure in our modern society.

While smart communities and infrastructure require computing, there is often little consideration as to provide it. In my first blog in this series, I spoke to the layered concept of planes or spaces. Namely, the physical plane, the space where physical entities exist. The physical plane consists of the infrastructure itself as well as the sensors and actuators that control it. Overlaid onto the physical plane is the cyberspace plane. Cyberspace consists of networks and computing resources to send and receive information to and from the physical space. Then we have the compute plane itself, which I refer to as the algorithmic space, where the actual computation occurs. As you will recall, however, I didn’t cover where these computational resources reside in physical space in this series’ first or second blog. This blog tackles this subject concerning smart infrastructure.

One of the first obvious considerations is the sheer scale, both in density and geography. Many smart infrastructure systems can cover hundreds of miles of geography. Additionally, these systems are often critical, such as power grids or water distribution systems. These systems will have very low latency requirements for command and control of the environment. So, it follows that we should have a quick discussion regarding the concepts of bandwidth, often referred to as the network speed and latency, also known as signal propagation delay.

What does speed mean?

We hear it all the time. “What is the network speed?” and the response will typically be “We are running at 10 Gig” or something like that. This is a misnomer. Bandwidth is not speed, it is capacity. The technical definition of bandwidth is “a measure of how much data can be transferred from one point of the network to another within a specific amount of time,” typically measured in seconds. Hence, we have the typical notation of XX/s, such as 1G/s, 10G/s, 25G/s. In other words, bandwidth is the amount of data transferred over a specific amount of time is a measure of capacity and not a measure of speed. Folks in the know will shrug, but this factoid is an eye-opener for many people. The misnomer does have a rationale; however, I will cover that later.

Speed is about how fast something travels, measured by the distance traveled per unit of time. A car obviously travels much faster than a bicycle and a bicycle faster than walking. This is common sense. It is the nature of our physical universe. There is no difference with data other than the fact that data obviously travels much faster than vehicles or humans. The technical term, as I pointed out earlier, is signal propagation delay. There is a universal speed limit within our physical world, and that limit is the speed of light, measured at 299,792,458 meters per second (m/s). As defined in physics, m/s is the proper physical notation for speed. Additionally, the notation for the speed of light is simply the letter “c,” representing the speed of light in a vacuum. You can find the speed of light symbol of “c” in many formulas, including Albert Einstein’s equation, E = mc2.

Nothing in networking travels this fast. The speed limit within networking is dependent on the medium that is doing the transport of the signal. As an example, check out the following table of various mediums used in modern networking technology.

 

Medium Type %/c
Thick Coaxial Cable 77%
Thin Coaxial Cable 65%
Unshielded Twisted Pair 59%
Optical Fiber 67%

 

Note that these averages will vary on cable or fiber type, but they can be taken at value for calculation purposes to a large degree. Just like a car or an airplane, it takes time to get from point A to point B. However, we also need to consider network delay, or latency, defined as the time it takes to deliver data from a device to its final destination. Networking switches interconnect medium links to complete the end-to-end data path. Switches introduce latency, although this delay is negligible in modern switching, typically in single microseconds. Discussing latency for wireless technologies such as Wi-Fi 6E and 5G is a bit too technical to get into details in this blog. Still, it is also typically in the order of milliseconds.

As always, an analogy makes things much easier to understand. We can use the shipment of packages as an example. Let’s say that we are shipping by air. We have two aircraft traveling at the same speed; however, one is 25 meters in length and the other is 100 meters long. There is a maximum air speed limit of 600 miles per hour (mph), an excellent average airspeed for jet aircraft (by comparison, the speed of sound is 767 mph). Obviously, traveling from point A to point B will take the same amount of time for both aircraft, but the larger aircraft will carry four times the package capacity.

Using this analogy for data equates to four times the data, which results in our interpretation and use of the term ‘speed’ of the data link. Computation is dependent on data. Therefore, larger bandwidths provide for faster computing. But please understand, importantly, bandwidth does not move the data signal faster; the speed limits are enforced airplanes as well as bandwidths (i.e. 25G/s vs. 100G/s) in networking.

Enter Edge to Cloud Compute

So, this gets to the core issue of the article. Given the criticality of the systems involved, the latency of sensing, command, and control is paramount. For example, if there were a valve failure for natural gas distribution, it is critical to seal the valve as soon as possible to avoid explosive conditions or systems failure. The bandwidth is immaterial if the round-trip latency is too long. This brings us to the concept of a control loop and the importance of minimizing latency within them. To do this, let’s look at a simple formula, where:

L = End to End Latency

l = Link Latency

s = Switch Latency

so,

L = s1 + l1 + s2 + l2 + s3…

One can surmise that it is common sense that it is highly desirable or even a requirement to minimize the end-to-end latency path for critical systems control. Let’s say that we have our gas valve sensor, and it needs to feed its input to a computing center that is 25 miles away. After computations are made, control signals are sent back to an actuator to control the valve actuator or the valve’s actuator upstream in the pipeline. Without going into technical details, the round-trip latency will simply be too far out of scope. To reduce the control loop latency, we need to move the controlling compute resources closer to the edge. By doing this, we can significantly reduce the end-to-end latency of the control loop. This is what edge computing is all about, but how is this accomplished? It turns out that it is a lot more complex than simply placing a PC or a server out on a light pole. Figure 1 illustrates what is referred to as edge to cloud compute infrastructure.

cloud compute infrastructure

Figure 1 – Cloud Compute Infrastructure

On the left-hand side of the diagram, we have the layered model that we laid out in the first blog and accompanying video. On the right-hand side, we see different levels of compute infrastructure from edge to cloud. Note how the compute control loop is limited from the local edge compute to the Internet of Things (IoT) and Operational Technology (OT) environments. Sensors will stream data, and the local compute resource will receive and even perform data manipulation such as extract, transform, and load (ETL). This data massaging occurs before transferring the data to a district or regional data center for aggregation and perhaps intermediate analytics. Alerts, however, are handled at the edge compute level. If an alert comes in from a sensor, the local compute level has the intelligence to actuate control back down into the system for minimal control loop latency. Alerts and control signals typically are not very large data sets. Bandwidth is secondary, and instead, control loop latency is the primary concern. Control loop latency is addressed by edge compute facilities that are located in very close proximity if not onsite to the systems in question. At the edge compute layer, there is enough systemic logic to react to incidents that may arise.

At the same time, the edge compute layer uploads its accrued data to the regional or district data center, where it is aggregated with other edge compute zones and, in turn, fed to the Global Data Center, which is often cloud-hosted. There are many variations to the model. The GDC might be a large repository where large-scale analytics can be run and displayed down to the regional Data Centers, which would provide the role of operation centers for the IoT/OT geography zone.  The edge compute functionality is usually hardened by computing resources located on light poles or perhaps underground relay points within the geographic zone in question. As shown in Figure 1, the data becomes a cohesive ecosystem where data is used for different purposes at different levels within the system. The data may also extend outside of the ecosystem for other processes such as metering and billing. A great example of this use case would be electrical power or water & gas distribution and consumption.

This type of computing architecture will become more and more common as infrastructures are embedded with advanced artificial intelligence (AI) to support them. Note that the architecture allows for multiple benefits. It allows for the vast extraction, transformation, and upload (ETL) of data into the analytics environment where usage trending, prediction, and projection can occur. At the same time, the edge compute layer provides for concise command and control loops into the critical infrastructure from the compute space. The regional or district level data centers provide the management, monitoring, and physical maintenance of the IoT/OT environment with the required human crews for dispatch. The data centers also offer a level for the aggregation of potentially huge data trains and other ETL-type processes if needed. The whole system scales very well and provides the best of both worlds. The result is very low latency for systems control loops and vast data acquisition and analytics that can potentially leverage predictive AI. It’s an exciting world that is evolving right before our eyes. However, smart communities and infrastructure needs to function within the limitations of the physical universe.

 

 

Get the latest stories sent straight to your inbox!

Related Government Stories