When learning about blockchain consensus algorithms and distributed systems in general, you will inevitably come across terms like FLP impossibility and Byzantine fault tolerance. While there is plenty of literature on these subjects, it often suffers from a narrow focus, failing to explain the connections and relationships between them. Furthermore, much of the existing literature gives either too much or not enough technical detail — I found this to be especially true when learning about consensus algorithms like the proof of stake.
This miniseries is an attempt to strike that balance of detail for beginner to intermediate readers who know the What and are trying to understand the Why and How. We will try to “connect the dots” by explaining the theoretical background of consensus agreement in distributed systems (part 1) and how it applies and relates to different algorithms used in the blockchain (part 2).
We assume the readers know what a consensus algorithm is and why it is needed in a distributed system. If a consensus algorithm is to be reliable, it must satisfy the following properties:
To design an algorithm that is fault tolerant, we must first understand what exactly a node failure is and how it manifests in terms of the node’s behavior. Here is a non-exhaustive list of some of the main types of failures in distributed systems:
Lastly and equally imperative, we must have a clear understanding of the underlying infrastructure and the constraints it puts on the system (e.g. communication and message processing delays, clock synchronization, etc.). There are several models of distributed systems that pose different assumptions on the behavior of the infrastructure; to keep things simple, we will introduce only two fundamental ones:
It becomes apparent that an asynchronous model with byzantine failures presents the weakest set of assumptions from all the possible combinations we listed. In fact, it is the weakest set of assumptions, period — it subsumes any other model and failure type. Unfortunately, finding an algorithm for such conditions is also the most difficult as we have to account for non-deterministic behavior and are not able to reliably detect even crash failures. Indeed, as we will shortly see, finding such algorithm turns out to be completely impossible.
In 1985, Michael Fischer, Nancy Lynch and Michael Paterson (hence FLP) published a seminal paper, which became known as the FLP Impossibility theorem. The paper lays down a proof of the following statement:
In an asynchronous model with crash failures, it is impossible to find a consensus algorithm that would satisfy safety, liveness and fault tolerance.
In other words, such algorithm does not exist. Key things to highlight here:
So what can we do? Although the situation might seem hopeless at first, it is not the end of the world. Here are our options:
Extending the model means we pose some additional assumptions (like the upper bound for delay in the synchronous model). As long as we are guaranteed these assumptions will be met, we have no issue — a reliable consensus algorithm exists. The problem is that in the real world, most systems are intrinsically asynchronous, meaning the assumptions we make using some extended model will not always hold true. Let’s take the synchronous model as an example: in most cases, the delay for message delivery and processing will fit within the model’s bound (the higher the bound the lower the chance the delay will exceed it). But in an asynchronous system, the probability of the delay exceeding the bound will always be > 0, no matter the size of the bound. Thus, we cannot guarantee the correctness of the outcome in every case — the algorithm is not 100% reliable.
So what happens if the delay exceeds the bound? First of all, if the reason for that happening is a node failure, the algorithm is still correct — remember safety and fault tolerance (all healthy nodes will reach an agreement). Only if a healthy node does not respond within the set bound, the algorithm loses one of its properties — in this case safety, since liveness is guaranteed by using a timeout for failure detection.
To better illustrate, we will use the 2-Phase Commit algorithm as an example. 2-Phase Commit is an atomic commitment protocol used in distributed databases, serving two functions:
It is important to realize that reaching consensus is only one part of 2-Phase Commit. Even more important, however, is the fact that 2-Phase Commit is NOT a reliable protocol. As we will shortly demonstrate, it does not satisfy the 3 properties even in a synchronous model. With that said, here is the complete message flow (credit Wikipedia):
Coordinator Participants
1. QUERY TO COMMIT
-------------------------------->
2. VOTE YES/NO prepare*/abort*
<--------------------------------
commit*/abort* 3. COMMIT/ROLLBACK
-------------------------------->
4. ACKNOWLEDGMENT commit*/abort*
<--------------------------------
end
Notice in step 3 that Rollback is sent if some participant fails to respond. This is detected by using a timeout. Let’s look at all possible failures and how different models and different implementation decisions affect the outcome of the algorithm.
1. Coordinator fails to send Query to commit
Coordinator Participants
1. QUERY TO COMMIT
------> X
2. Participant fails to respond to Query to commit
Coordinator Participants
1. QUERY TO COMMIT
-------------------------------->
2. ?
X <-------
3. ROLLBACK
-------------------------------->
3. Coordinator fails to send Commit/Rollback
Coordinator Participants
1. QUERY TO COMMIT
-------------------------------->
2. YES
<--------------------------------
3. ?
------> X
4. Participant fails after it receives Commit/Rollback
By now you might be asking, why would anybody choose such an imperfect protocol? First of all, thanks to FLP we know that in the real world with intrinsically asynchronous systems, we will never get a perfect reliable consensus algorithm. There will always be some kind of risk. As with everything in tech, it comes down to tradeoffs — different solutions yield different characteristics in terms of performance, which properties the algorithm loses, to what degree, under what circumstances and probabilities. The 2-Phase Commit is a good example of such tradeoff: in exchange for low message complexity (under ideal circumstances 3 message delays to commit/rollback) we accept some risk of losing liveness.
With that, we’re ready to move on and complicate things a bit.
So far, we have been dealing solely with crash and omission failures with limited behavior — node is either correct or not responding. Unfortunately, in the real world, we often cannot rely on such predictable behavior. There might be intermittent network outages, hardware malfunctions and software bugs, causing all sorts of issues. What’s even worse, some nodes might be “dishonest” actors — providing the rest of the network with wrong and conflicting information to achieve some nefarious goal.
The natural question comes: can we protect against such non-deterministic “anything-possible” failures and if yes, how? The answer was presented in a 1982 paper by Lamport, Shostak and Pease on an abstract scenario known as the Byzantine Generals Problem (hence Byzantine failures and Byzantine Fault Tolerance). In this scenario, several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. The generals can communicate only by messenger and must decide on a common plan of action — either attack or retreat. The caveat is that some of the generals might be traitors, trying to prevent the loyal ones from reaching an agreement, resulting in an uncoordinated action and ultimately a defeat.
The generals must have an algorithm to guarantee that:
If we translate this into the domain of distributed systems: all healthy honest nodes must agree on the same value, proposed by one of the nodes (assuming the proposing node is honest). If the proposing node isn’t honest, the remaining nodes still have to agree on one value, however it might be different from the one proposed — usually, some default value is used (in the Byzantine generals scenario “retreat” is the default value; more on this later).
Before we move any further, here are the assumptions we’ll be working with:
Let’s start with a scenario with 3 nodes, having one traitor:
Once the lieutenants receive the order from the commander, they exchange the received values amongst each other to verify the validity and agree on common action. In the first figure, lieutenant 2 is a traitor and sends false message to lieutenant 1. Lieutenant 1 thus receives conflicting messages. Based on the stated algorithm requirements, when the commander is loyal, the lieutenant must obey his order — hence lieutenant 1 must choose to attack.
In figure 2, the commander is the traitor, sending different orders to the lieutenants. Lieutenant 1 will end up with the same conflicting messages as in figure 1, being unable to differentiate between the two cases. If he chooses to obey the commander, lieutenant 2, who will face the same dilemma, will carry out different order than lieutenant 1 — consensus will not be reached! If, instead, the lieutenants opt for a default value (retreat) when facing conflicting messages, then the requirement of following a loyal commander’s order will not be met in the case from figure 1!
The previous example shows that the Byzantine generals problem cannot be solved for just 3 nodes. In fact, the paper concludes that Byzantine fault tolerance can be achieved only if no more than 1/3 of the overall number of generals is traitorous (in other words, for m dishonest nodes, there must be at least 3m + 1 honest nodes to achieve BFT). So, for 1 dishonest node there must be a minimum of 4 nodes in total:
In the above picture, a loyal commander sends value v to the lieutenants. The lieutenants then exchange the received values. Lieutenant 3 is a traitor, sending false value x to lieutenants 1 and 2 (the picture only shows messages for lieutenant 2). Both lieutenants thus receive values (v, v, x). They then simply use a majority function to pick a value, in this case, value v.
Let’s consider a scenario where the commander is traitorous, sending different values to each of his lieutenants:
After the lieutenants exchange the received messages, they each end up with the following values: (x, y, z). No value has a majority. In this case, the lieutenants will use a default value (retreat). Note that if, for example, the commander sent the value x (instead of value y) to lieutenant 2, the lieutenants would agree on value x, as it would obtain the majority. The goal of the solution is for the nodes to agree on one value, not to determine whether a value is true or false, or whether the proposing node is honest or not.
It is important to understand that the solution is recursive. In each iteration, the commander sends the value to the lieutenants. Each lieutenant then becomes the commander in the next iteration, sending the value to the remaining lieutenants, and so on. The number of additional iterations necessary to achieve BFT equals the number of maximum dishonest nodes assumed. In the example above, we assume 1 dishonest node at max and thus there is just one iteration required.
This has serious implications on the practicality and performance of the solution:
Having an exponential message complexity clearly isn’t very efficient. One way of simplifying the solution and improving the performance is by restricting the nodes’ ability to lie using signed messages:
With signed messages, a solution for just 3 nodes exists:
The commanding general, who is a traitor, signs conflicting orders and sends them to the lieutenants. Each lieutenant then signs the received order and forwards it to the remaining lieutenants. The difference from the oral messages is that this time the lieutenants know that the commander is the traitor because the original orders were signed by him. They can thus safely choose to retreat.
Without going into the nitty-gritty details of the algorithm, here are its attributes:
Keep in mind however that even though the message complexity was reduced dramatically, some performance degradation will incur due to the fact that we must generate and verify the signatures for every message.
Given the acquired knowledge, can we make the 2-Phase commit algorithm tolerant to Byzantine failures? The answer is yes, albeit it technically won’t be a “2-Phase” commit anymore as it requires adding another phase after the third step:
Coordinator Participants
1. QUERY TO COMMIT
------------------------------>
2. VOTE YES/NO
<------------------------------
3. COMMIT/ROLLBACK
------------------------------>
4.BROADCAST
|<->|<->|<->|
5. ACKNOWLEDGMENT
<------------------------------
In step 4, each participant broadcasts the message received in step 3 to the remaining participants. Once the broadcast is done, the participants will use the majority function (assuming oral messages) to pick the action to execute. The rest of the algorithm continues as in standard 2-Phase commit.
Why do we need the broadcast only after the third step and not after the first two? Remember the goal of the algorithm is to achieve consensus, regardless of whether the value is true or whether it comes from an honest node:
That does it for part 1. Part 2 will cover main consensus algorithms in the blockchain, their approach to the FLP problem and Byzantine failures and how they compare to each other in terms of performance and other characteristics. I will place the link here once ready. Stay tuned.