Shyam Arjarapu


Mastering MongoDB - Faster elections during rolling maintenance

The maintenance/upgrade in a MongoDB replica set is typically performed in a rolling fashion. The rolling maintenance/upgrade process requires you to perform the maintenance on one Secondary at a time with the Primary member to go through the maintenance last.

When you stepDown the Primary, all the eligible Secondaries will hold an election for a new Primary. Until a new Primary is elected the database is not available for writes. So, ‘How would you quickly elect a new Primary while performing the rolling maintenance/upgrade?’
Photo by annie bolin on Unsplash

This is one of the many articles in multi-part series, Mastering MongoDB — One tip a day, solely created for you to master MongoDB by learning ‘One tip a day’. In a few series of articles, I would like to give various tips to help you answer the above question.

This article discusses the rolling maintenance, implications of not having the Primary, the steps required to elect a new Primary quickly and finally Pros/Cons of the approach.

Mastering — Rolling maintenance

MongoDB offers redundancy and high availability of the database via replica sets. The replica sets will not only help the database quickly recover from node failures/network partitions, but also gives you the ability to perform maintenance tasks without affecting the high availability.

The key to being highly available and yet be able to perform maintenance is ‘the rolling maintenance’; Where the maintenance is performed on one Secondary at a time

  • Stop the MongoDB process/service on a Secondary
  • Perform the required maintenance/upgrade on the server
  • Start the MongoDB process/service on the server
  • Wait for MongoDB on the server to catch up on the Oplog
  • Repeat the above on the other secondaries in the replica set

Given a replica set with 3 MongoDB servers — mon01 (Primary), mon02 (Secondary) and mon03 (Secondary), the rolling maintenance process typically requires

  • Perform the maintenance on the Secondary server, mon03
  • Perform the maintenance on the other Secondary server, mon02
  • stepDown the Primary server, mon01
  • Wait for new Primary to be elected, let’s say mon02
  • Perform the maintenance on the former primary server, mon01

For more detailed information on the rolling upgrades please read, Your Ultimate Guide to Rolling Upgrades by Bryan Reinero.

Implications of not having a Primary

By default, both the read/write operations are executed on the Primary. You may use the Secondary read preference to read from one of the Secondaries. However, the Primary is the only member in the replica set that receives write operations. So it is crucial to always have a Primary in your replica set.

When a Primary is not available/reachable by the majority of Secondaries, all the eligible Secondaries will hold an election for a new Primary. Until a new primary is elected, all the Write and Reads (on Primary) operations originating from your client drivers will either wait for Primary to be available and/or timeout. So, it is important to have Primary elected quickly so that number of operations awaiting for the Primary to be available are low.

How to quickly elect a new Primary

If a Primary is unexpectedly terminated and/or facing a network connectivity issues from the majority of servers, the secondaries can call in for an election after missing the heartbeats for 10 seconds. So it takes some time.

StepDown the Primary

Stepping down the primary expedites the failover procedure. Therefore it is recommended to stepDown the primary to forcefully trigger the election than shutDown the Primary and let the Secondaries find out about unreachable primary. I bet, most of you are using this approach already. So, let’s review some of the other tips you could leverage before you stepDown() the Primary.

Make only one Secondary to be electable

If the replication lag on one of your secondary is low, then you can pro-actively choose it to be the only secondary that can be elected for the next election. Typically you choose a secondary that has

  • Low replication lag
  • Low network latency
  • Similar Priority as current Primary
  • Or Member with next highest Priority

Assuming you want to pin a Secondary server, mon02, as the next Primary then you can make the Secondary server, mon03, ineligible to become Primary for 60 seconds by running rs.freeze(60) on it. This will make the election faster as the Secondary server, mon02, is the only electable Primary, when you stepDown the server mon01.

Reduce the settings.electionTimeoutMillis

The default time limit for detecting when a replica set’s primary is unreachable is 10 seconds. By reducing the settings.electionTimeoutMillis to let’s say 2 seconds, you would be making the detection and hence the election faster.

Summary of steps for faster election

I have summarized the below steps to have faster election during the maintenance period. Please test them before running them on production environment.

  • Identify the server you want it to be next Primary
  • Execute rs.freeze(60) on all other Secondaries
  • Set settings.electionTimeoutMillis=2000 on replica set configuration
  • Execute rs.stepDown() on current Primary
  • Wait for the new Primary to be elected
  • Reset the settings.electionTimeoutMillis=10000 on the new Primary

Pros & Cons of the approach

Assuming all the above suggestions worked out well for you. You may be wondering -

“If having lower electionTimeoutMillis helps with quicker elections, then why can’t I keep it at lower number all the time?”

Great question! Your application might be facing reduced traffic during the rolling maintenance period. Most importantly, you are closely monitoring all the servers and manually pin a Secondary to be the next Primary. So it could be okay for you to have a lower electionTimeoutMillis value at that very moment.

However, setting the electionTimeoutMillis to a low value will not only result in faster failover but also has a negative effect on increased sensitivity to the primary node or network slowness or spottiness.

This may result in too many elections when there are transient network connectivity issues. On the contrary, setting the electionTimeoutMillis to larger value makes your replica set more resilient to transient network interruptions but also results in slower average failover time.

Bottomline is YMMV; You would need to test various electionTimeoutMillis values and choose the one that suites you better. Or leave it at the default value of 10 seconds.

No matter what you do, “Never set the electionTimeoutMillis to a value less than the round-trip network latency time between two of your members.”

Hands-On lab exercises

This lab exercise helps you understand the steps needed to quickly elect a new Primary during a rolling maintenance.

Setup environment

First, you would need an environment to play around. I have created 3 RHEL v7.5 instances in AWS, you may as well run them all on your localhost with /etc/hosts entries for the servers. If you already have a MongoDB v3.6 replica set environment, you may skip this step.

Download and untar MongoDB v3.6 binaries, start MongoDB server listening to bind all IPs on port 27000.

A bash script with download MongoDB v3.6.5 and start mongod on port 27000

Initiate replica set

Initiate a MongoDB replica set using the above hosts on server mon01

A bash script to initiate a replica set with 3 hosts we created earlier

Display the replica set config and status

Please note the outputs from rs.config() and rs.status() respectively. They help you determine current settings.electionTimeoutMillis: 10000 and select a Secondary to be the next Primary based on the values in priority, optime, lastHeartbeat and pingMs.

A JavaScript method to show the replica set configuration settings
A JavaScript method to show the replica set status information

Choose the potential next Primary

The rs.status() and db.printSlaveReplication() commands show that both the Secondaries, mon02 and mon03, are all caught up on the Oplog entries of Primary mon01. However, the pingMs shows that mon02 is a lot closer to mon01 than the mon03. So you may choose the mon02 as the next potential Primary while stepping down the current Primary.

A JavaScript function to show the database printSlaveReplicationInfo command output

Freeze the other Secondaries

Based on the above pingMs, we would not want the server mon03 to be elected as Primary. So, run the below command to freeze it from contending in the next election term.

A JavaScript method invoking rs.freeze to make the replica set member ineligible to become primary

Set electionTimeoutMillis and stepDown the Primary

Reconfigure the electionTimeoutMillis of the replica set settings on the current Primary, mon01. Finally, execute the command rs.stepDown() to forcibly trigger the election and electing mon02 as the next Primary.

A JavaScript code to set the electionTimeoutMillis to 2 seconds and stepDown the primary

You may notice that the new primary is available within ~2 seconds compared to the default of 10–12 seconds. The below mongod.log files on the individual machines show that mon02 transition to primary is completed within ~2 seconds.

A bash script to show the transition of mon02 from Secondary to Primary

Reset the electionTimeoutMillis on the new primary

Once the new Primary is elected, please revert back the electionTimeoutMillis back to the default value to avoid any frequent elections during the transient network connectivity issues.

A JavaScript code to reset the electionTimeoutMillis back to its default value


I want to remind an important point —

Although the MongoDB database application is highly available for reads from secondaries during the elections, the database is not available for writes until a Primary is elected. So it is important to ensure the primary is available sooner than later to meet your SLA for writes.

With the tips discussed here, you can have a new Primary elected within 3 seconds. If your application was serving about 10,000 operations / second, you have about 30,000 operations waiting on the new Primary. Now, you may wonder — “What measures can I take to ensure that the database server would not cripple when all those 30,000 operations hit the new Primary at the same time?”

Again — it’s a great question, but that’s a topic for another day. Hopefully, you learned something new today on you scale the path to “Mastering MongoDB — One tip a day”.

Previous Articles

  • Mastering MongoDB — One tip a day series
    Series of articles solely created for you to master MongoDB
  • Tip # 003: Transactions
    A long awaited and most requested feature for many, has finally arrived
  • Tip # 002: createRole
    How to prevent someone dropping your collections?
  • Tip # 001: currentOp
    Know the operations currently executing on MongoDB server inside out

More by Shyam Arjarapu

Topics of interest

More Related Stories