Troubleshoot Replica Sets

Troubleshoot Replica Sets

This section describes common strategies for troubleshootingreplica set deployments.

Check Replica Set Status

To display the current state of the replica set and current state ofeach member, run the rs.status() method in a mongoshell connected to the replica set’s primary. For descriptionsof the information displayed by rs.status(), seereplSetGetStatus.

Note

The rs.status() method is a wrapper that runs thereplSetGetStatus database command.

Check the Replication Lag

Replication lag is a delay between an operation on the primaryand the application of that operation from the oplog to thesecondary. Replication lag can be a significant issue and canseriously affect MongoDB replica set deployments. Excessivereplication lag makes “lagged” members ineligible to quickly becomeprimary and increases the possibility that distributed read operationswill be inconsistent.

To check the current length of replication lag:

In a mongo shell connected to the primary, call thers.printSlaveReplicationInfo() method.

Returns the syncedTo value for each member,which shows the time when the last oplog entry was written to thesecondary, as shown in the following example:

source: m1.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary
source: m2.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary

A delayed member may show as 0seconds behind the primary when the inactivity period on the primary isgreater than the members[n].slaveDelayvalue.

Note

The rs.status() method is a wrapper around thereplSetGetStatus database command.

Monitor the rate of replication by checking for non-zero or increasingoplog time values in the Replication Lag graph available inCloud Managerand in Ops Manager.

Replication Lag Causes

Possible causes of replication lag include:

Network Latency

Check the network routes between the members of your set to ensurethat there is no packet loss or network routing issue.

Use tools including ping to test latency between setmembers and traceroute to expose the routing of packetsnetwork endpoints.

Disk Throughput

If the file system and disk device on the secondary isunable to flush data to disk as quickly as the primary, thenthe secondary will have difficulty keeping state. Disk-relatedissues are incredibly prevalent on multi-tenant systems, includingvirtualized instances, and can be transient if the system accessesdisk devices over an IP network (as is the case with Amazon’sEBS system.)

Use system-level tools to assess disk status, includingiostat or vmstat.

Concurrency

In some cases, long-running operations on the primary can blockreplication on secondaries. For best results, configure writeconcern to require confirmation of replication tosecondaries. This prevents write operations fromreturning if replication cannot keep up with the write load.

You can also use the database profiler to see if there are slow queries orlong-running operations that correspond to the incidences of lag.

Appropriate Write Concern

If you are performing a large data ingestion or bulk load operationthat requires a large number of writes to the primary, particularlywith unacknowledged write concern, the secondaries will not be able toread the oplog fast enough to keep up with changes.

To prevent this, request write acknowledgementwrite concern after every 100,1,000, or another interval to provide an opportunity forsecondaries to catch up with the primary.

For more information see:

Flow Control

Starting in MongoDB 4.2, administrators can limit the rate at whichthe primary applies its writes with the goal of keeping the majoritycommitted lag undera configurable maximum value flowControlTargetLagSeconds.

By default, flow control is enabled.

Note

For flow control to engage, the replica set/sharded cluster musthave: featureCompatibilityVersion (FCV) of4.2 and read concern majority enabled. That is, enabled flowcontrol has no effect if FCV is not 4.2 or if read concernmajority is disabled.

With flow control enabled, as the lag grows close to theflowControlTargetLagSeconds, writes on the primary must obtaintickets before taking locks to apply writes. By limiting the number oftickets issued per second, the flow control mechanism attempts to keep thethe lag under the target.

For information on flow control statistics, see:

Slow Application of Oplog Entries

Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set nowlog oplog entries that take longer than the slowoperation threshold to apply. These slow oplog messages are loggedfor the secondaries in the diagnostic log under the REPL component with the text appliedop: <oplog entry> took <num>ms. These slow oplog entries dependonly on the slow operation threshold. They do not depend on the loglevels (either at the system or component level), or the profilinglevel, or the slow operation sample rate. The profiler does notcapture slow oplog entries.

Test Connections Between all Members

All members of a replica set must be able to connect to everyother member of the set to support replication. Always verifyconnections in both “directions.” Networking topologies and firewallconfigurations can prevent normal and required connectivity, which canblock replication.

Starting in MongoDB 3.6, MongoDB binaries, mongod andmongos, bind to localhost by default. If thenet.ipv6 configuration file setting or the —ipv6command line option is set for the binary, the binary additionally bindsto the localhost IPv6 address.

Previously, starting from MongoDB 2.6, only the binaries from theofficial MongoDB RPM (Red Hat, CentOS, Fedora Linux, and derivatives)and DEB (Debian, Ubuntu, and derivatives) packages bind to localhost bydefault.

When bound only to the localhost, these MongoDB 3.6 binaries can onlyaccept connections from clients (including the mongo shell,other members in your deployment for replica sets and sharded clusters)that are running on the same machine. Remote clients cannot connect tothe binaries bound only to localhost.

To override and bind to other ip addresses, you can use thenet.bindIp configuration file setting or the—bind_ip command-line option to specify a list of hostnames or ipaddresses.

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

For example, the following mongod instance binds to boththe localhost and the hostname My-Example-Associated-Hostname, which isassociated with the ip address 198.51.100.1:

mongod --bind_ip localhost,My-Example-Associated-Hostname

In order to connect to this instance, remote clients must specifythe hostname or its associated ip address 198.51.100.1:

mongo --host My-Example-Associated-Hostname
 
mongo --host 198.51.100.1

Consider the following example of a bidirectional test of networking:

Example

Given a replica set with three members running on three separatehosts:

m1.example.net
m2.example.net
m3.example.net

All three use the default port 27017.

Test the connection from m1.example.net to the other hostswith the following operation set m1.example.net:

mongo --host m2.example.net --port 27017
 
mongo --host m3.example.net --port 27017

Test the connection from m2.example.net to the other twohosts with the following operation set from m2.example.net,as in:

mongo --host m1.example.net --port 27017
 
mongo --host m3.example.net --port 27017

You have now tested the connection betweenm2.example.net and m1.example.net in both directions.

Test the connection from m3.example.net to the other twohosts with the following operation set from them3.example.net host, as in:

mongo --host m1.example.net --port 27017
 
mongo --host m2.example.net --port 27017

If any connection, in any direction fails, check your networkingand firewall configuration and reconfigure your environment toallow these connections.

Socket Exceptions when Rebooting More than One Secondary

When you reboot members of a replica set, ensure that the set is ableto elect a primary during the maintenance. This means ensuring that a majority ofthe set’s members[n].votes areavailable.

When a set’s active members can no longer form a majority, the set’sprimary steps down and becomes a secondary. Starting inMongoDB 4.2, when the primary steps down, it no longer closes allclient connections. In MongoDB 4.0 and earlier, when the primary stepsdown, it closes all client connections.

Clients cannot write to the replica set until the members elect a newprimary.

Example

Given a three-member replica set where every member hasone vote, the set can elect a primary if at least two memberscan connect to each other. If you reboot the two secondaries atonce, the primary steps down and becomes a secondary. Until at leastanother secondary becomes available, i.e. at least one of the rebootedsecondaries also becomes available, the set has no primary and cannotelect a new primary.

For more information on votes, see Replica Set Elections. Forrelated information on connection errors, see Does TCP keepalive time affect MongoDB Deployments?.

Check the Size of the Oplog

A larger oplog can give a replica set a greater tolerance forlag, and make the set more resilient.

To check the size of the oplog for a given replica set member,connect to the member in a mongo shell and run thers.printReplicationInfo() method.

The output displays the size of the oplog and the date ranges of theoperations contained in the oplog. In the following example, the oplogis about 10 MB and is able to fit about 26 hours (94400 seconds) ofoperations:

configured oplog size:   10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time:  Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time:   Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now:                     Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)

The oplog should be long enough to hold all transactions for thelongest downtime you expect on a secondary. [1] At a minimum, an oplogshould be able to hold minimum 24 hours of operations; however, manyusers prefer to have 72 hours or even a week’s work of operations.

For more information on how oplog size affects operations, see:

Note

You normally want the oplog to be the same size on allmembers. If you resize the oplog, resize it on all members.

To change oplog size, see the Change the Size of the Oplogtutorial.

[1]	Starting in MongoDB 4.0, the oplog can grow past its configured sizelimit to avoid deleting the `majority commit point`.

Oplog Entry Timestamp Error

Consider the following error in mongod output and logs:

replSet error fatal couldn't query the local local.oplog.rs collection.  Terminating mongod after 30 seconds.
<timestamp> [rsStart] bad replSet oplog entry?

Often, an incorrectly typed value in the ts field in the lastoplog entry causes this error. The correct data type isTimestamp.

Check the type of the ts value using the following two queriesagainst the oplog collection:

db = db.getSiblingDB("local")
db.oplog.rs.find().sort({$natural:-1}).limit(1)
db.oplog.rs.find({ts:{$type:17}}).sort({$natural:-1}).limit(1)

The first query returns the last document in the oplog, while thesecond returns the last document in the oplog where the ts valueis a Timestamp. The $type operator allows you to selectBSON type 17, is the Timestamp data type.

If the queries don’t return the same document, then the last document inthe oplog has the wrong data type in the ts field.

Example

If the first query returns this as the last oplog entry:

{ "ts" : {t: 1347982456000, i: 1},
  "h" : NumberLong("8191276672478122996"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 4 } }

And the second query returns this as the last entry where tshas the Timestamp type:

{ "ts" : Timestamp(1347982454000, 1),
  "h" : NumberLong("6188469075153256465"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 3 } }

Then the value for the ts field in the last oplog entry is of thewrong data type.

To set the proper type for this value and resolve this issue,use an update operation that resembles the following:

db.oplog.rs.update( { ts: { t:1347982456000, i:1 } },
                    { $set: { ts: new Timestamp(1347982456000, 1)}})

Modify the timestamp values as needed based on your oplog entry. Thisoperation may take some period to complete because the update mustscan and pull the entire oplog into memory.

Duplicate Key Error on local.slaves

Changed in version 3.0.0.

MongoDB 3.0.0 removes the local.slaves collection. Forlocal.slaves error in earlier versions of MongoDB, refer to theappropriate version of the MongoDB Manual.