A note from Anthony: If you haven’t already, please read the article Gaining clarity on key terminology: Bitcoin versus blockchain versus distributed ledger technology. If you’re new to this space, this article will provide some useful context.
From central to distributed
At the core of distributed ledger technologies (DLTs) is the distributed ledger, which contains a record of all transactions in a system.
As the name suggests, a distributed ledger is called this because data is stored across a network of computers called nodes.
Key to the operation of this distributed database is a mechanism to ensure the nodes on the network verify the transactions, and agree with their order and existence on the ledger — a mechanism called consensus. In the case of applications like a cryptocurrency, this process is critical to prevent double spending or other invalid data being written to the underlying ledger, which, put simply, is a database of all transactions.
Today, most of the world’s information systems are centralised, and we have been forced to trust the companies that control them. Because a large centralised system does not have to deal with consensus, they are efficient and scale well.
On the other hand, a decentralised system has appeal as it removes the necessity of a single organisation controlling the data. A properly executed DLT removes the need for users of a system to trust a third party or each other, which is why they are often described as ‘trustless’.
To make this trustless system work, there have been many different consensus mechanisms devised, each with their pros and cons. They generally serve the same core purpose as described above, but differ in methodology. The primary difference between consensus mechanisms is the way in which they delegate and reward the verification of transactions.
Hang on. What’s the point of consensus?
Before we go on, let’s back up a bit and consider the time before DLT and discuss traditional databases. When you have a database, and an application can write transactions to the database, as well as read transactions from the database, this database is called a master database. These have been around for decades.
The market then evolved, and, to address requirements for disaster recovery, failover, and redundancy, we saw the introduction of a second database, a copy of the master.
Every write goes to the master, but the transactions are written to the slave as well. The application can then read from the master and the slave, but not write to the slave. If something suddenly goes wrong with the master, the slave could be ‘elevated’ to be a master.
The market evolved again, and it became possible for applications to write to both databases in the network — a multi-master database scenario. This was a significant step forward, but introduced other problems.
The issue with this multi-master approach is that when there are multiple applications in the same place across multiple databases, occasionally the applications will write to the same location in each database.
The database then somehow needs to resolve this conflict to ensure the databases stay in sync. As you add more masters and more users, the likelihood of these write conflicts goes up, and the efforts to remain in synchronisation and ensure consensus become harder.
Until recently, it was assumed that one legal entity had control of the master databases. For example, Amazon would never think to give Google control of a master database supporting Amazon’s online shopping site.
With distributed consensus, different legal entities can have a master but remain in consensus. Three conditions are critical to support this scenario:
- All parties controlling an instance of a database need to come to an agreement on the order of transactions, and commit the transactions to the database in that order.
- No single party should be able to change or influence the order of transactions. This is a concept called immutability.
- Stopping the transactions across the community is impossible.
The diagram below illustrates the evolution of computing including the shift from centralised to distributed databases.
Consensus is a process
So, we’ve talked about why we need consensus and what it is. Let’s look at how it works.
As discussed, consensus is a process. In broad terms, it is a means to ensure the transactions written to nodes across a network remain in sync, are immutable, and prevent the network from many (but not all) types of attack.
To achieve these objectives, the process of consensus follows four steps:
- Each node creates the transactions it wants to record.
- The data is shared between the nodes (an obvious and critical step).
- Consensus is established on the order of valid transactions.
- Nodes update their transactions to reflect the consensus result.
In reality, these steps can be completed in seconds or minutes; bitcoin takes about 10 minutes. The goal is to get to step four as quickly as possible without breaking consensus.
Consensus algorithms and protocols
Now that you have a general understanding of consensus, let me define two further concepts related to consensus — the protocol and algorithms. These two concepts will help you understand how consensus is achieved, and the components parts of any DLT implementation.
A protocol is a set of rules that govern how a system operates. The rules establish the basic functioning of the different parts, how they interact with each other, and what conditions are necessary for a robust implementation.
The different parts of a protocol are not sensitive to order or chronology — it doesn’t matter which part goes first. A protocol also doesn’t tell the system how to produce a result. It doesn’t have an objective and doesn’t produce an output.
In terms of how it works, it’s like a car engine.
Consensus algorithms relate to the rules (mathematics) that each node follows to achieve consensus. These algorithms describe the steps that will need to take place. For example, a proof of stake algorithm might define the rules such that the creator of the next block is chosen via various combinations of random selection, stake or age.
Unlike a consensus protocol, which is a set of rules that determine how the system achieves consensus, an algorithm is a set of instructions that produce an output or a result. It can be a simple script or a complicated program.
The order of the instructions is essential, and the algorithm specifies what that order is. It tells the system what to do to achieve the desired result. It may not know what the result is beforehand, but it knows that it wants one.
If a consensus protocol can be likened to a car engine, then a consensus algorithm can be likened to the actions of the driver of the car.
As blockchain and cryptocurrency expert Noelle Acheson explained in a 2016 article:
- The protocol is a set of rules that determines how the system functions.
- The algorithm tells the system what to do.
- The protocol is. The algorithm does.
Why consensus could be crucial to your business
In the words of Ethereum founder Vitalik Buterin, the purpose of a consensus algorithm is to “allow for the secure updating of a state according to some specific state transition rules, where the right to perform the state transitions is distributed among some economic set”.
People also often claim that one kind of consensus is better than another . But as you now know, there are different solutions which fit different situations.
Overall, consensus is a process that facilitates synchronisation across a distributed network of untrusted nodes. In the future, this will allow us to build decentralised applications — either privately (i.e. an enterprise use) or publicly.
In a future article, I will explain and compare the different types of consensus (algorithms).