The maintenance/upgrade in a MongoDB replica set is typically performed in a rolling fashion. The rolling maintenance/upgrade process requires you to perform the maintenance on one Secondary at a time with the Primary member to go through the maintenance last.
When you stepDown the Primary, all the eligible Secondaries will hold an election for a new Primary. Until a new Primary is elected the database is not available for writes. So, ‘How would you quickly elect a new Primary while performing the rolling maintenance/upgrade?’
This is one of the many articles in multi-part series, Mastering MongoDB — One tip a day, solely created for you to master MongoDB by learning ‘One tip a day’. In a few series of articles, I would like to give various tips to help you answer the above question.
This article discusses the rolling maintenance, implications of not having the Primary, the steps required to elect a new Primary quickly and finally Pros/Cons of the approach.
Mastering — Rolling maintenance
MongoDB offers redundancy and high availability of the database via replica sets. The replica sets will not only help the database quickly recover from node failures/network partitions, but also gives you the ability to perform maintenance tasks without affecting the high availability.
The key to being highly available and yet be able to perform maintenance is ‘the rolling maintenance’; Where the maintenance is performed on one Secondary at a time
- Stop the MongoDB process/service on a Secondary
- Perform the required maintenance/upgrade on the server
- Start the MongoDB process/service on the server
- Wait for MongoDB on the server to catch up on the Oplog
- Repeat the above on the other secondaries in the replica set
Given a replica set with 3 MongoDB servers — mon01 (Primary), mon02 (Secondary) and mon03 (Secondary), the rolling maintenance process typically requires
- Perform the maintenance on the Secondary server,
- Perform the maintenance on the other Secondary server,
- stepDown the Primary server,
- Wait for new Primary to be elected, let’s say
- Perform the maintenance on the former primary server,
For more detailed information on the rolling upgrades please read, Your Ultimate Guide to Rolling Upgrades by Bryan Reinero.
Implications of not having a Primary
By default, both the read/write operations are executed on the Primary. You may use the Secondary read preference to read from one of the Secondaries. However, the Primary is the only member in the replica set that receives write operations. So it is crucial to always have a Primary in your replica set.
When a Primary is not available/reachable by the majority of Secondaries, all the eligible Secondaries will hold an election for a new Primary. Until a new primary is elected, all the Write and Reads (on Primary) operations originating from your client drivers will either wait for Primary to be available and/or timeout. So, it is important to have Primary elected quickly so that number of operations awaiting for the Primary to be available are low.
How to quickly elect a new Primary
If a Primary is unexpectedly terminated and/or facing a network connectivity issues from the majority of servers, the secondaries can call in for an election after missing the heartbeats for 10 seconds. So it takes some time.
StepDown the Primary
Stepping down the primary expedites the failover procedure. Therefore it is recommended to stepDown the primary to forcefully trigger the election than shutDown the Primary and let the Secondaries find out about unreachable primary. I bet, most of you are using this approach already. So, let’s review some of the other tips you could leverage before you
stepDown() the Primary.
Make only one Secondary to be electable
If the replication lag on one of your secondary is low, then you can pro-actively choose it to be the only secondary that can be elected for the next election. Typically you choose a secondary that has
- Low replication lag
- Low network latency
- Similar Priority as current Primary
- Or Member with next highest Priority
Assuming you want to pin a Secondary server,
mon02, as the next Primary then you can make the Secondary server,
mon03, ineligible to become Primary for 60 seconds by running
rs.freeze(60) on it. This will make the election faster as the Secondary server,
mon02, is the only electable Primary, when you stepDown the server
Reduce the settings.electionTimeoutMillis
The default time limit for detecting when a replica set’s primary is unreachable is 10 seconds. By reducing the
settings.electionTimeoutMillis to let’s say 2 seconds, you would be making the detection and hence the election faster.
Summary of steps for faster election
I have summarized the below steps to have faster election during the maintenance period. Please test them before running them on production environment.
- Identify the server you want it to be next Primary
rs.freeze(60)on all other Secondaries
settings.electionTimeoutMillis=2000on replica set configuration
rs.stepDown()on current Primary
- Wait for the new Primary to be elected
- Reset the
settings.electionTimeoutMillis=10000on the new Primary
Pros & Cons of the approach
Assuming all the above suggestions worked out well for you. You may be wondering -
“If having lower electionTimeoutMillis helps with quicker elections, then why can’t I keep it at lower number all the time?”
Great question! Your application might be facing reduced traffic during the rolling maintenance period. Most importantly, you are closely monitoring all the servers and manually pin a Secondary to be the next Primary. So it could be okay for you to have a lower electionTimeoutMillis value at that very moment.
However, setting the electionTimeoutMillis to a low value will not only result in faster failover but also has a negative effect on increased sensitivity to the primary node or network slowness or spottiness.
This may result in too many elections when there are transient network connectivity issues. On the contrary, setting the electionTimeoutMillis to larger value makes your replica set more resilient to transient network interruptions but also results in slower average failover time.
Bottomline is YMMV; You would need to test various electionTimeoutMillis values and choose the one that suites you better. Or leave it at the default value of 10 seconds.
No matter what you do, “Never set the electionTimeoutMillis to a value less than the round-trip network latency time between two of your members.”
Hands-On lab exercises
This lab exercise helps you understand the steps needed to quickly elect a new Primary during a rolling maintenance.
First, you would need an environment to play around. I have created 3 RHEL v7.5 instances in AWS, you may as well run them all on your localhost with
/etc/hosts entries for the servers. If you already have a MongoDB v3.6 replica set environment, you may skip this step.
Download and untar MongoDB v3.6 binaries, start MongoDB server listening to bind all IPs on port 27000.
Initiate replica set
Initiate a MongoDB replica set using the above hosts on server
Display the replica set config and status
Please note the outputs from
rs.status() respectively. They help you determine current
settings.electionTimeoutMillis: 10000 and select a Secondary to be the next Primary based on the values in
Choose the potential next Primary
db.printSlaveReplication() commands show that both the Secondaries,
mon03, are all caught up on the Oplog entries of Primary
mon01. However, the
pingMs shows that
mon02 is a lot closer to
mon01 than the
mon03. So you may choose the
mon02 as the next potential Primary while stepping down the current Primary.
Freeze the other Secondaries
Based on the above
pingMs, we would not want the server
mon03 to be elected as Primary. So, run the below command to freeze it from contending in the next election term.
Set electionTimeoutMillis and stepDown the Primary
Reconfigure the electionTimeoutMillis of the replica set settings on the current Primary,
mon01. Finally, execute the command
rs.stepDown() to forcibly trigger the election and electing
mon02 as the next Primary.
You may notice that the new primary is available within ~2 seconds compared to the default of 10–12 seconds. The below mongod.log files on the individual machines show that
mon02 transition to primary is completed within ~2 seconds.
Reset the electionTimeoutMillis on the new primary
Once the new Primary is elected, please revert back the electionTimeoutMillis back to the default value to avoid any frequent elections during the transient network connectivity issues.
I want to remind an important point —
Although the MongoDB database application is highly available for reads from secondaries during the elections, the database is not available for writes until a Primary is elected. So it is important to ensure the primary is available sooner than later to meet your SLA for writes.
With the tips discussed here, you can have a new Primary elected within 3 seconds. If your application was serving about 10,000 operations / second, you have about 30,000 operations waiting on the new Primary. Now, you may wonder — “What measures can I take to ensure that the database server would not cripple when all those 30,000 operations hit the new Primary at the same time?”
Again — it’s a great question, but that’s a topic for another day. Hopefully, you learned something new today on you scale the path to “Mastering MongoDB — One tip a day”.
- Mastering MongoDB — One tip a day series
Series of articles solely created for you to master MongoDB
- Tip # 003: Transactions
A long awaited and most requested feature for many, has finally arrived
- Tip # 002: createRole
How to prevent someone dropping your collections?
- Tip # 001: currentOp
Know the operations currently executing on MongoDB server inside out