What is common between streaming movie on Netflix, searching air ticket on Google, buying clothes on Amazon? You rely on distributed computing to do it. What is distributed  computing ? Distributed computing is  a collection of machines to work as a coherent group. Machines run their own operations and communicate with each other over network. Problems with distributed computing There are many things that can go wrong in distributed computing 1) Machines may crash arbitrarily any time. 2) Network failures — Machines talk to each other over network. Network can be unreliable, messages between nodes can get dropped or delayed for no specific reason at any time. How do we solve the problem? How do we drive certainty when machines can crash, network can fail? Answer lies in in . The idea behind replication is very simple: keep multiple copies of data on physically isolated machines so that the failure in one does not impact the others. That being the case, don’t we need machines to agree when they store multiple copies? data replication Data replication through consensus allow machines to work as a coherent group that can survive the failures of some of its members. Consensus based replication functions like a parliament.  Multiple machines elect a leader, agree on a value through consensus. Once they agree on a value, the verdict is final. They stop functioning when majority of machines fail. This is similar to the way parliaments  works, starts functioning when majority prevails, stops functioning when there is no majority. Consensus based replication - Group of 5 machines that form distributed  computing, operate as a single group. They will continue to accept writes even if 2 machines fail. If more than 2 machines fail, they stop accepting write requests. Illustrative example Algorithms was first described by Turing Award winner in 1990s inspired by the example of a parliament in the ancient Greek island of Paxos. The Paxos algorithm Leslie Lamport Paxos Algorithm Paxos describes the actions of the processors by their roles : client, acceptor, proposer, learner, and leader. The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a to make purchase on Amazon. Client : The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum. Acceptor (Voters): A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and acting as a coordinator to move the protocol forward when conflicts occur. Proposer: Learners act as the replication factor for the protocol. Once a Client request has been agreed upon by the Acceptors, the Learner may take action (i.e. execute the request and send a response to the client). To improve availability of processing, additional Learners can be added. Learner: Paxos requires a distinguished Proposer (called the leader) to make progress. Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates. Leader: Paxos Phases (Prepare)  : A prosper sends a prepare request with number n to majority of acceptors. If the number n seen by acceptor is not highest, the request is ignored. Phase 1 (Accept): If the prosper receives a response from majority of acceptors, it sends an accept request with value v. Phase 2 : Google Spanner, Chubby, Apache Cassandra  uses Paxos algorithm. Production uses of Paxos Paxos is quite difficult to understand, in spite of numerous attempts to make it more approachable. Furthermore, its architecture requires complex changes to support practical systems. Challenges : Raft Algorithm proposed by Stanford University researchers in 2014, designed to overcome challenges with Paxos, easy to understand, implements. It builds consensus where the task of data updates and replication is handled by a “leader”. It implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated data. The leader accepts entries from clients, replicates them on other machines, and tells machines when it is safe to commit. Having a leader simplifies things. For example, the leader can decide where to place data, flows in a simple fashion from the leader to other machines Raft algorithm, Salient features: Raft uses a stronger form of leadership than other consensus algorithms. For example, client data entries only flow from the leader to other servers. This simplifies the management of the replicated data and makes it  easier to understand. 1) Strong leader: : Raft uses randomized timers to elect leaders. This adds only a small amount of mechanism to the heartbeats already required for any consensus algorithm, while resolving conflicts simply and rapidly. 2) Leader election Raft’s mechanism for changing the set of machines in the cluster uses a new joint consensus approach where the majorities of two different configurations overlap during transitions. This allows the cluster to continue operating normally during configuration changes. 3) Membership changes: Conclusion : Data Replication is foundation for driving reliability in Distributed Systems (e.g. Google Spanner, Amazon Dynamo). Distributed System in turn power applications (e.g. Netflix, Amazon e-commerce, Google) that we use on every day. Next time you watch on movie on Netflix, make purchase on Amazon, you know what makes it reliable.

Flow

Amazon

Apache

Google

Netflix

Distributed Computing: Illusion of Single System

The Promise and Perils of Serverless Computing

Read My Stories

Too Long; Didn't Read

Data Reliability in an Unreliable World

Data Reliability in an Unreliable World

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

How to Scale Organization

10 Things Everyone Should Know About Machine Learning

10 Repositories that Will Transform the Way You Approach Technical Interviews

10 (Free) Data Structure and Algorithm Courses Junior Developers Should Explore

10 Data Structure & Algorithms Books Every Programmer Should Read

How to Scale Organization

10 Things Everyone Should Know About Machine Learning

10 Repositories that Will Transform the Way You Approach Technical Interviews

10 (Free) Data Structure and Algorithm Courses Junior Developers Should Explore

10 Data Structure & Algorithms Books Every Programmer Should Read

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps