How Kubernetes manages your cluster with systems programming concepts
Disclosure: Manifold, the developer marketplace, has previously sponsored Hacker Noon. Use code HACKERNOON2018 to get $10 off any service.
Kubernetes is the most popular container orchestrator by far. Much of its success comes from its reliability. All software has bugs. Kubernetes is somehow less buggy than alternatives when it comes to running your containers.
Kubernetes eventually arrives at your desired number of running containers, in time. It unrelentingly keeps that number running. The Kubernetes documentation refers to this as Kubernetes being self-healing. This behavior comes from a core philosophy in the design of Kubernetes.
“The goal seeking behavior of the control loop is very stable. This has been proven in Kubernetes where we have had bugs that have gone unnoticed because the control loop is fundamentally stable and will correct itself over time.
If you are edge triggered you run risk of compromising your state and never being able to re-create the state. If you are level triggered the pattern is very forgiving, and allows room for components not behaving as they should to be rectified. This is what makes Kubernetes work so well.”
― Joe Beda, CTO of Heptio (As quoted in Cloud Native Infrastructure, by Justin Garrison and Kris Nova)
Interrupts: Edge and Level Triggering
Edge and level triggering are concepts that come from electronics and systems programming. They refer to how a system should respond to the shape of an electrical signal (or digital logic) over time. Should the system care about when the signal changes from low to high and high to low, or should it care about if the signal is at high?
To explain it another way, given the following simple addition:
> let a = 3;
> a += 4;
In an edge triggered view of the operation, we would see:
add 4 to a
This would happen once, at the time of the addition.
In a level triggered view of the operation, we would see:
a is 7
We’d see this continuously from the time of the addition, until the next event occurs.
Edge and Level Triggering in Distributed Systems
In the abstract, there’s no obvious difference between edge and level triggering. In the real world, even at the systems programming level, we have to deal with practical limitations. A common limitation is sample rate. If a system does not sample the signal frequently enough, it may miss a trigger, either for an edge transition, or for a short change in level.
On the larger scale of whole computers and large networks, there are more problems to contend with. The network is unreliable. People are clumsy. Squirrels are unrelenting. In a way, these problems are like a bad or inconsistent sample rate. They obscure our view of the signal.
Disruptions Change Perception
Let’s look at how a disruption of the signal affects how it is observed in edge and level triggered systems:
Under ideal conditions, both edge triggered and level triggered systems observe a correct view of the signal. Immediately after the signal transitions from on to off, they both see the signal as being in an off state.
With two disruptions placed around the first two changes to signal state, the differences between edge and level triggered systems are clear. The edge triggered view of the signal misses the first rise. The level triggered system assumes the signal is in its last observed state until it sees otherwise. This leads to an observed signal that is mostly correct, but delayed until after the disruption.
Fewer disruptions doesn’t always lead to a better outcome. With a single disruption obscuring the fall from high back to low, the level triggered system is mostly correct again. The edge triggered system only sees two rises, leading to a state that the original signal was never in.
To express this with addition again, the signal expressed:
> let a = 1;
> a += 1;
> a -= 1;
> a += 1;
But the edge triggered system observed:
> let a = 1;
> a += 1;
> a += 1;
Reconciling Desired and Actual States
Kubernetes is not just observing one signal, but two: the desired state of the cluster, and the actual state. The desired state is the state that humans using the cluster wish for it to be in (“Run two instances of my application container”). The actual state ideally matches the desired state, but it is subject to any number of hardware failures and malicious rodents. These can move it away from the desired state. Even time is a factor, as it isn’t possible to instantly have the actual state match the desired state. Container images have to download from the registry, applications need time for graceful shutdown, and so on.
Kubernetes has to take the actual state, and reconcile it with the desired state. It does so continuously, taking both states, determining the differences between them, and applying whatever changes have to be made to bring the actual state towards the desired state.
Scaling a Deployment in Kubernetes
Even without disruptions to the network, an edge triggered system trying to reconcile two states could end up with an incorrect outcome.
If we start with a single container replica, and wish to scale to 5 replicas, then down to two replicas, an edge triggered system would see the following for the desired state:
> let replicas = 1;
> replicas += 4;
> replicas -= 3;
The actual state of the system cannot react instantly to these commands. As in the diagram, it can end up terminating 3 replicas when there are only 3 running. This leaves us with 0 replicas instead of the desired 2.
In a level triggered system, we always compare the complete desired and actual states. This reduces the chances of state desynchronization (a bug).
Edge triggering is not inherently bad; it does have advantages over level triggering. Edge triggering only transmits what has changed, when it has changed.
Problems related to disruptions in edge triggered systems can be mitigated. This is often done through a periodic reconciliation with the full state, like how a level triggered system works. Disruptions may also be mitigated through an explicit ordering and versioning of events.
For Kubernetes, thinking about the problem as a level triggered system has led to an architecture that is clean, simple, and does what the user wants in spite of the inherent problems in distributed computing.
Special thanks to Meg Smith for the diagrams included in this article.