What is common between streaming movie on Netflix, searching air ticket on Google, buying clothes on Amazon? You rely on distributed computing to do it.
Distributed computing is a collection of machines to work as a coherent group. Machines run their own operations and communicate with each other over network.
There are many things that can go wrong in distributed computing
1) Machines may crash arbitrarily any time.
2) Network failures — Machines talk to each other over network. Network can be unreliable, messages between nodes can get dropped or delayed for no specific reason at any time.
How do we drive certainty when machines can crash, network can fail?
Answer lies in in data replication. The idea behind replication is very simple: keep multiple copies of data on physically isolated machines so that the failure in one does not impact the others. That being the case, don’t we need machines to agree when they store multiple copies?
Consensus based replication allow machines to work as a coherent group that can survive the failures of some of its members. Consensus based replication functions like a parliament. Multiple machines elect a leader, agree on a value through consensus. Once they agree on a value, the verdict is final. They stop functioning when majority of machines fail. This is similar to the way parliaments works, starts functioning when majority prevails, stops functioning when there is no majority.
Illustrative example - Group of 5 machines that form distributed computing, operate as a single group. They will continue to accept writes even if 2 machines fail. If more than 2 machines fail, they stop accepting write requests.
The Paxos algorithm was first described by Turing Award winner Leslie Lamport in 1990s inspired by the example of a parliament in the ancient Greek island of Paxos.
Paxos describes the actions of the processors by their roles : client, acceptor, proposer, learner, and leader.
Client : The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a to make purchase on Amazon.
Acceptor (Voters): The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum.
Proposer: A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and acting as a coordinator to move the protocol forward when conflicts occur.
Learner: Learners act as the replication factor for the protocol. Once a Client request has been agreed upon by the Acceptors, the Learner may take action (i.e. execute the request and send a response to the client). To improve availability of processing, additional Learners can be added.
Leader: Paxos requires a distinguished Proposer (called the leader) to make progress. Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates.
Phase 1 (Prepare) : A prosper sends a prepare request with number n to majority of acceptors. If the number n seen by acceptor is not highest, the request is ignored.
Phase 2 (Accept): If the prosper receives a response from majority of acceptors, it sends an accept request with value v.
Production uses of Paxos : Google Spanner, Chubby, Apache Cassandra uses Paxos algorithm.
Challenges : Paxos is quite difficult to understand, in spite of numerous attempts to make it more approachable. Furthermore, its architecture requires complex changes to support practical systems.
Raft algorithm, proposed by Stanford University researchers in 2014, designed to overcome challenges with Paxos, easy to understand, implements. It builds consensus where the task of data updates and replication is handled by a “leader”. It implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated data. The leader accepts entries from clients, replicates them on other machines, and tells machines when it is safe to commit. Having a leader simplifies things. For example, the leader can decide where to place data, flows in a simple fashion from the leader to other machines
1) Strong leader: Raft uses a stronger form of leadership than other consensus algorithms. For example, client data entries only flow from the leader to other servers. This simplifies the management of the replicated data and makes it easier to understand.
2) Leader election: Raft uses randomized timers to elect leaders. This adds only a small amount of mechanism to the heartbeats already required for any consensus algorithm, while resolving conflicts simply and rapidly.
3) Membership changes: Raft’s mechanism for changing
the set of machines in the cluster uses a new joint consensus approach where the majorities of two different configurations overlap during transitions. This allows the cluster to continue operating normally during configuration changes.
Data Replication is foundation for driving reliability in Distributed Systems (e.g. Google Spanner, Amazon Dynamo). Distributed System in turn power applications (e.g. Netflix, Amazon e-commerce, Google) that we use on every day. Next time you watch on movie on Netflix, make purchase on Amazon, you know what makes it reliable.