How Kubernetes manages your cluster with systems programming concepts Disclosure: Manifold , the developer marketplace, has previously sponsored Hacker Noon. Use code HACKERNOON2018 to get $10 off any service. is the container orchestrator by far. Much of its success comes from its reliability. All software has bugs. Kubernetes is somehow less buggy than alternatives when it comes to running your containers. Kubernetes most popular Kubernetes eventually arrives at your desired number of running containers, in time. It unrelentingly keeps that number running. The refers to this as Kubernetes being This behavior comes from a core philosophy in the design of Kubernetes. Kubernetes documentation self-healing. “The goal seeking behavior of the control loop is very stable. This has been proven in Kubernetes where we have had bugs that have gone unnoticed because the control loop is fundamentally stable and will correct itself over time. If you are edge triggered you run risk of compromising your state and never being able to re-create the state. If you are level triggered the pattern is very forgiving, and allows room for components not behaving as they should to be rectified. This is what makes Kubernetes work so well.” ― Joe Beda, CTO of Heptio (As quoted in , by Justin Garrison and Kris Nova) Cloud Native Infrastructure Interrupts: Edge and Level TriggeringEdge and level triggering for the same signal. Edge and level triggering are concepts that come from electronics and . They refer to how a system should respond to the shape of an electrical signal (or digital logic) over time. Should the system care about the signal changes from low to high and high to low, or should it care about the signal is at high? systems programming when if To explain it another way, given the following simple addition: > let a = 3 ; > a += 4 ; < 7 In an edge triggered view of the operation, we would see: add 4 to a This would happen once, at the time of the addition. In a level triggered view of the operation, we would see: a is 7 We’d see this continuously from the time of the addition, until the next event occurs. Edge and Level Triggering in Distributed Systems In the abstract, there’s no obvious difference between edge and level triggering. In the real world, even at the systems programming level, we have to deal with practical limitations. A common limitation is . If a system does not sample the signal frequently enough, it may miss a trigger, either for an edge transition, or for a short change in level. sample rate On the larger scale of whole computers and large networks, there are to contend with. The is unreliable. People are . Squirrels are . In a way, these problems are like a bad or inconsistent sample rate. They obscure our view of the signal. more problems network clumsy unrelenting Disruptions Change Perception Let’s look at how a disruption of the signal affects how it is observed in edge and level triggered systems: Ideal Conditions How edge and level triggered systems interpret a signal. Under ideal conditions, both edge triggered and level triggered systems observe a correct view of the signal. Immediately after the signal transitions from on to off, they both see the signal as being in an off state. Two Disruptions Disrupting the rise and fall loses the high signal for the edge triggered system, but arrives at the correct end state. With two disruptions placed around the first two changes to signal state, the differences between edge and level triggered systems are clear. The edge triggered view of the signal misses the first rise. The level triggered system assumes the signal is in its last observed state until it sees otherwise. This leads to an observed signal that is mostly correct, but delayed until after the disruption. One Disruption A single well-placed disruption can have a large impact on the edge triggered system. Fewer disruptions doesn’t always lead to a better outcome. With a single disruption obscuring the fall from high back to low, the level triggered system is mostly correct again. The edge triggered system only sees two rises, leading to a state that the original signal was never in. To express this with addition again, the signal expressed: > let a = 1 ; > a += 1 ; > a -= 1 ; > a += 1 ; < 2 But the edge triggered system observed: > let a = 1 ; > a += 1 ; > a += 1 ; < 3 Reconciling Desired and Actual States Kubernetes is not just observing one signal, but two: the state of the cluster, and the state. The desired state is the state that humans using the cluster wish for it to be in ( ). The actual state ideally matches the desired state, but it is subject to any number of hardware failures and malicious rodents. These can move it away from the desired state. Even time is a factor, as it isn’t possible to instantly have the actual state match the desired state. Container images have to download from the registry, applications need time for graceful shutdown, and so on. desired actual “Run two instances of my application container” Kubernetes has to take the actual state, and it with the desired state. It does so continuously, taking both states, determining the differences between them, and applying whatever changes have to be made to bring the actual state towards the desired state. reconcile Scaling a Deployment in Kubernetes In an edge triggered system, we could diverge wildly from our desired outcome. Even without disruptions to the network, an edge triggered system trying to reconcile two states could end up with an incorrect outcome. If we start with a single container replica, and wish to scale to 5 replicas, then down to two replicas, an edge triggered system would see the following for the desired state: > let replicas = 1 ; > replicas += 4 ; > replicas -= 3 ; The actual state of the system cannot react instantly to these commands. As in the diagram, it can end up terminating 3 replicas when there are only 3 running. This leaves us with 0 replicas instead of the desired 2. In a level triggered system, we always compare the complete desired and actual states. This reduces the chances of state desynchronization (a bug). Levelling Out Edge triggering is not inherently bad; it does have advantages over level triggering. Edge triggering only transmits what has changed, when it has changed. Problems related to disruptions in edge triggered systems can be mitigated. This is often done through a periodic reconciliation with the full state, like how a level triggered system works. Disruptions may also be mitigated through an explicit ordering and versioning of events. For Kubernetes, thinking about the problem as a level triggered system has led to an architecture that is clean, simple, and does what the user wants in spite of the inherent problems in distributed computing. Special thanks to Meg Smith for the diagrams included in this article.

Level Triggering and Reconciliation in Kubernetes

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

105 Stories To Learn About K8s

10 Best Practices for Using Kubernetes Network Policies

1 Stories To Learn About Weekly Sponsor

3 Free Ways to Learn Kubernetes and Red Hat OpenShift

21 Resources and Tutorials to Learn Kubernetes

188 Stories To Learn About Containers

105 Stories To Learn About K8s

10 Best Practices for Using Kubernetes Network Policies

1 Stories To Learn About Weekly Sponsor

3 Free Ways to Learn Kubernetes and Red Hat OpenShift

21 Resources and Tutorials to Learn Kubernetes

188 Stories To Learn About Containers

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps