Replica Set Elections

Replica sets use elections to determine whichset member will become primary. Replica sets can trigger anelection in response to a variety of events, such as:

In the following diagram, the primary node was unavailable for longerthan the configured timeoutand triggers the automatic failoverprocess. One of the remaining secondaries calls for an election toselect a new primary and automatically resume normal operations.

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

The replica set cannot process write operations until theelection completes successfully. The replica set can continue to serveread queries if such queries are configured torun on secondaries.

The median time before a cluster elects a new primary should nottypically exceed 12 seconds, assuming default replicaconfiguration settings. This includes time required tomark the primary as unavailable andcall and complete an election.You can tune this time period by modifying thesettings.electionTimeoutMillis replication configurationoption. Factors such as network latency may extend the time requiredfor replica set elections to complete, which in turn affects the amountof time your cluster may operate without a primary. These factors aredependent on your particular cluster architecture.

Your application connection logic should include tolerance for automaticfailovers and the subsequent elections. Starting in MongoDB 3.6, MongoDB driverscan detect the loss of the primary and automaticallyretry certain write operations a single time,providing additional built-in handling of automatic failovers and elections:

  • MongoDB 4.2-compatible drivers enable retryable writes by default
  • MongoDB 4.0 and 3.6-compatible drivers must explicitly enableretryable writes by including retryWrites=true in the connection string.

Factors and Conditions that Affect Elections

Replication Election Protocol

Changed in version 4.0: MongoDB 4.0 removes the deprecated replication protocol version 0.

Replication protocolVersion: 1 reducesreplica set failover time and accelerate the detection of multiplesimultaneous primaries.

With protocolVersion 1, you can usecatchUpTimeoutMillis to prioritize between fasterfailovers and preservation of w:1 writes.

For more information on pv1, seeReplica Set Protocol Version.

Heartbeats

Replica set members send heartbeats (pings) to each other every twoseconds. If a heartbeat does not return within 10 seconds, the othermembers mark the delinquent member as inaccessible.

Member Priority

After a replica set has a stable primary, the election algorithm willmake a “best-effort” attempt to have the secondary with the highestpriority available call an election.Member priority affects both the timing and theoutcome of elections; secondaries with higher priority call electionsrelatively sooner than secondaries with lowerpriority, and are also more likely to win. However, a lower priorityinstance can be elected as primary for brief periods, even if a higherpriority secondary is available. Replica set members continueto call elections until the highest priority member available becomesprimary.

Members with a priority value of 0 cannot become primary and donot seek election. For details, seePriority 0 Replica Set Members.

Loss of a Data Center

With a distributed replica set, the loss of a data center may affectthe ability of the remaining members in other data center or datacenters to elect a primary.

If possible, distribute the replica set members across data centers tomaximize the likelihood that even with a loss of a data center, one ofthe remaining replica set members can become the new primary.

See also

Replica Sets Distributed Across Two or More Data Centers

Network Partition

A network partition may segregate a primary into a partitionwith a minority of nodes. When the primary detects that it can only seea minority of nodes in the replica set, the primary steps down asprimary and becomes a secondary. Independently, a member in thepartition that can communicate with a majority of the nodes (including itself)holds an election to become the new primary.

Voting Members

The replica set member configuration setting members[n].votesand member state determine whether amember votes in an election.

  • All replica set members that have their members[n].votessetting equal to 1 vote in elections. To exclude a member from votingin an election, change the value of the member’smembers[n].votes configuration to 0.

Changed in version 3.2:

Non-Voting Members

Although non-voting members do not vote in elections, these membershold copies of the replica set’s data and can accept read operationsfrom client applications.

Because a replica set can have up to 50 members, but only 7 votingmembers, non-votingmembers allow a replica set to have more than seven members.

Non-voting members must have priority of 0.

For instance, the following nine-member replica set has seven votingmembers and two non-voting members.

Diagram of a 9 member replica set with the maximum of 7 voting members.

A non-voting member has both votes andpriority equal to 0:

  1. {
  2. "_id" : <num>,
  3. "host" : <hostname:port>,
  4. "arbiterOnly" : false,
  5. "buildIndexes" : true,
  6. "hidden" : false,
  7. "priority" : 0,
  8. "tags" : {
  9.  
  10. },
  11. "slaveDelay" : NumberLong(0),
  12. "votes" : 0
  13. }

Important

Do not alter the number of votes to control whichmembers will become primary. Instead, modify themembers[n].priority option. _Only_alter the number of votes in exceptional cases. For example, topermit more than seven members.

To configure a non-voting member, seeConfigure Non-Voting Replica Set Member.