You’re Reading Part 2 of a 3-Part Series on Paxos Consensus Algorithms in Distributed Systems. You’re Reading Part 2 of a 3-Part Series on Paxos Consensus Algorithms in Distributed Systems. In part 1 of this series, we looked at why consensus is such a tricky problem in distributed systems and how Paxos provides a way out. Through Alice and Bob’s battle for a lock, we saw how Paxos uses majority agreement to make decisions that can’t be undone once chosen. part 1 of this series Even when nodes fail, recover, or rejoin, the system still converges safely on one value. That’s the magic of Paxos—it keeps things consistent in an inconsistent world. In Part 2, we’ll dive into the messier edge cases and see how Paxos still manages to hold things together. How Paxos Handle Edge Cases In Part 1, we saw Paxos work smoothly: Alice proposed a value, the nodes accepted it, and even when Bob joined later, the algorithm forced him to carry forward Alice’s decision. Real systems, however, aren’t always this tidy. Messages can get lost, nodes can crash, and multiple proposers might compete simultaneously. Part 1 Let’s walk through a few messy scenarios with our familiar friends, Alice and Bob. Edge Case 1 – Lost Commit (Alice’s Proposal Stalls) Alice once again proposes AliceLock with proposal number 1001. AliceLock She contacts nodes 1, 4, and 5 and gets majority acceptance. She contacts nodes 1, 4, and 5 and gets majority acceptance. But here’s the twist: node 4 never sends back its final commit message due to a network glitch or node 4 went down or Alice disappears. But here’s the twist: node 4 never sends back its final commit message due to a network glitch or node 4 went down or Alice disappears. node 4 never sends back its final commit message node 4 went down Alice disappears Let's try to unpack each failure. Node 4 never sends back its final commit message (network glitch): Node 4 never sends back its final commit message (network glitch): This is totally possible. Alice thinks she doesn’t have a majority, even though she actually does.The value is already chosen once a majority of acceptors have accepted.The missing acknowledgment only prevents Alice from knowing it succeeded.Safety holds, but Alice may stop making progress. This is totally possible. Alice thinks she doesn’t have a majority, even though she actually does. actually does The value is already chosen once a majority of acceptors have accepted. already chosen The missing acknowledgment only prevents Alice from knowing it succeeded. knowing Safety holds, but Alice may stop making progress. Node 4 goes down after accepting but before replying: Node 4 goes down after accepting but before replying: Same effect as the network glitch: Alice may not see the quorum even though one exists.Any other proposer can query other nodes and will learn that AliceLock was accepted by a majority. Same effect as the network glitch: Alice may not see the quorum even though one exists. Any other proposer can query other nodes and will learn that AliceLock was accepted by a majority. Alice (the proposer) disappears: Alice (the proposer) disappears: Also fine! Proposers in Paxos are stateless initiators. Once Alice sends out her proposal and enough acceptors persist it, Alice herself can die without breaking consensus.Any other proposer can always step in with a higher-numbered proposal and will be forced to carry forward AliceLock. Also fine! Proposers in Paxos are stateless initiators. Once Alice sends out her proposal and enough acceptors persist it, Alice herself can die without breaking consensus. stateless initiators Any other proposer can always step in with a higher-numbered proposal and will be forced to carry forward AliceLock. In either of the scenarios, nodes 1, 4 (if still alive) and 5 still remembers AliceLock. In either of the scenarios, nodes 1, 4 (if still alive) and 5 still remembers AliceLock. Now Bob arrives with BobLock (n=2001). By this time, let's say nodes 2 and 3 are also online and, through consensus, will learn that Alice has the lock eventually. BobLock (n=2001) He queries the nodes and hears back from nodes 1, 2, and 5. He queries the nodes and hears back from nodes 1, 2, and 5. All three report that they previously accepted AliceLock even though Alice never actually had the lock.Bob is forced to adopt AliceLock and feels “starved” — his new value never has a chance. All three report that they previously accepted AliceLock even though Alice never actually had the lock. AliceLock Bob is forced to adopt AliceLock and feels “starved” — his new value never has a chance. Lesson: Even if a commit acknowledgment is lost, Paxos ensures safety: Bob cannot override AliceLock. But liveness suffers — Bob’s new value makes no progress. Lesson: Even if a commit acknowledgment is lost, Paxos ensures safety: Bob cannot override AliceLock. But liveness suffers — Bob’s new value makes no progress. Lesson Edge Case 2: Both Alice and Bob arrive with their proposals simultaneously Step 1 – First accepts Node 4 accepts Alice’s proposal (n=1001, AliceLock).Node 5 accepts Bob’s proposal (n=2001, BobLock). Node 4 accepts Alice’s proposal (n=1001, AliceLock). n=1001, AliceLock Node 5 accepts Bob’s proposal (n=2001, BobLock). n=2001, BobLock Now both Alice and Bob have one vote each. Step 2 – Other nodes respond differently Node 2 sees Bob’s proposal and accepts BobLock (n=2001).Node 1 sees Alice’s proposal and accepts AliceLock (n=1001). Node 2 sees Bob’s proposal and accepts BobLock (n=2001). BobLock (n=2001) Node 1 sees Alice’s proposal and accepts AliceLock (n=1001). AliceLock (n=1001) So far: AliceLock has Node 1 + Node 4.BobLock has Node 2 + Node 5.Node 3 hasn’t decided yet. AliceLock has Node 1 + Node 4. AliceLock BobLock has Node 2 + Node 5. BobLock Node 3 hasn’t decided yet. Node 3 Step 3 – Node 3 crashes Before hearing from either proposer, Node 3 goes down. This leaves: Alice with 2 votes (Nodes 1 & 4).Bob with 2 votes (Nodes 2 & 5).Majority quorum = 3 (since 5 total nodes). Alice with 2 votes (Nodes 1 & 4). 2 votes Bob with 2 votes (Nodes 2 & 5). 2 votes Majority quorum = 3 (since 5 total nodes). Neither proposer can form a majority with Node 3 offline. Step 4 – Stalemate (temporary) Alice cannot reach quorum (only has 2/5).Bob cannot reach quorum (only has 2/5).With Node 3 offline, progress stalls. Alice cannot reach quorum (only has 2/5). Bob cannot reach quorum (only has 2/5). With Node 3 offline, progress stalls. Lesson: Safety is preserved: no conflicting value is committed yet, since quorum wasn’t reached. Lesson: Safety is preserved: no conflicting value is committed yet, since quorum wasn’t reached. Lesson : Step 5 – Retry with a higher number Suppose Bob retries with a new proposal number n=2002. n=2002 He sends a prepare(n=2002) to all nodes.Node 1 replies: “I previously accepted AliceLock at n=1001.”Node 2 replies: “I previously accepted BobLock at n=2001.”Node 4 replies: “I previously accepted AliceLock at n=1001.”Node 5 replies: “I previously accepted BobLock at n=2001.” He sends a prepare(n=2002) to all nodes. prepare(n=2002) Node 1 replies: “I previously accepted AliceLock at n=1001.” Node 2 replies: “I previously accepted BobLock at n=2001.” Node 4 replies: “I previously accepted AliceLock at n=1001.” Node 5 replies: “I previously accepted BobLock at n=2001.” So Bob learns: Highest prior accepted proposal = n=2001, BobLock. Highest prior accepted proposal = n=2001, BobLock. n=2001, BobLock By Paxos rules, he must carry forward BobLock. BobLock Step 6 – Consensus reached Bob now sends accept(n=2002, BobLock).Nodes 1–5 (except Node 3 which is down) respond.He gets Node 2 + Node 5 (already BobLock), and at least one of Node 1 or Node 4 may switch since the higher number is binding.This gives Bob a quorum (3/5). Bob now sends accept(n=2002, BobLock). accept(n=2002, BobLock) Nodes 1–5 (except Node 3 which is down) respond. He gets Node 2 + Node 5 (already BobLock), and at least one of Node 1 or Node 4 may switch since the higher number is binding. This gives Bob a quorum (3/5). Final decision: BobLock is chosen. Final decision: BobLock is chosen. Edge Case 3 – Minority Partition (No Quorum) Suppose a network partition occurs and only 2 out of 5 nodes are reachable (say nodes 4 and 5). 2 out of 5 nodes Alice proposes AliceLock (n=4001).She contacts nodes 4 and it accepts.But quorum requires 3 out of 5 — Alice falls short. Alice proposes AliceLock (n=4001). AliceLock (n=4001) She contacts nodes 4 and it accepts. But quorum requires 3 out of 5 — Alice falls short. 3 out of 5 At the same time, Bob proposes BobLock (n=4002) to node 5. BobLock (n=4002) It responds, but again only 1 votes is possible and Bob falls short.Neither Alice nor Bob can commit. It responds, but again only 1 votes is possible and Bob falls short. Neither Alice nor Bob can commit. Result: No value is chosen. Result: No value is chosen. Result: Lesson: Paxos prioritizes safety over availability. With fewer than a majority of nodes alive, the system cannot make progress. This is why Paxos-based systems may stall under minority partitions — it’s a tradeoff for never committing conflicting values. Lesson: Paxos prioritizes safety over availability. With fewer than a majority of nodes alive, the system cannot make progress. This is why Paxos-based systems may stall under minority partitions — it’s a tradeoff for never committing conflicting values. Lesson: safety over availability Edge Case 4 – Out-of-Order / Delayed Messages Now consider message delays: Alice sends a prepare with n=5001. Nodes 1–3 promise Alice.Then Bob arrives with n=5002. Nodes 4–5 promise Bob.Bob eventually gathers a majority (say nodes 2, 4, and 5) and commits BobLock. Alice sends a prepare with n=5001. Nodes 1–3 promise Alice. n=5001 Then Bob arrives with n=5002. Nodes 4–5 promise Bob. n=5002 Bob eventually gathers a majority (say nodes 2, 4, and 5) and commits BobLock. BobLock But later, a delayed accept message from Alice (n=5001, AliceLock) arrives at Node 1. delayed accept message from Alice (n=5001, AliceLock) Paxos rule: Node 1 checks its state and sees it already promised not to accept anything below 5002.The old message is rejected, even though it arrived late. Paxos rule: Node 1 checks its state and sees it already promised not to accept anything below 5002. The old message is rejected, even though it arrived late. rejected Lesson: Paxos tolerates asynchronous, delayed, and reordered messages. Outdated proposals are ignored once a higher-numbered promise exists, preserving safety. Lesson: asynchronous, delayed, and reordered messages Wrapping Up So far, we’ve seen Paxos handle a range of real-world messiness: Lost commits & proposer crashes — safety holds, though liveness suffers.Racing proposers with node failures — progress may stall temporarily, but higher numbers resolve the tie.Minority partitions — no progress without quorum, but no conflicting decisions either.Out-of-order messages — stale proposals are safely rejected, ensuring consistency. Lost commits & proposer crashes — safety holds, though liveness suffers. Lost commits & proposer crashes Racing proposers with node failures — progress may stall temporarily, but higher numbers resolve the tie. Racing proposers with node failures Minority partitions — no progress without quorum, but no conflicting decisions either. Minority partitions Out-of-order messages — stale proposals are safely rejected, ensuring consistency. Out-of-order messages Paxos guarantees one thing above all else: safety is never compromised. But this comes at the cost of liveness in certain situations — proposers can starve, partitions can halt progress, and competition can cause livelock. safety is never compromised In Part 3, we’ll explore how Raft (and Multi-Paxos) address these practical challenges, making leader-based consensus simpler and more efficient in real-world deployments. Part 3 Raft