Replica Set Data Synchronization

In order to maintain up-to-date copies of the shared data set,secondary members of a replica set sync or replicate data fromother members. MongoDB uses two forms of data synchronization:initial sync to populate new memberswith the full data set, and replication to apply ongoing changes to theentire data set.

Initial Sync

Initial sync copies all the data from one member of the replica set toanother member.

Process

When you perform an initial sync, MongoDB:

  • Clones all databases except the local database. To clone, themongod scans every collection in each source database andinserts all data into its own copies of these collections.

Changed in version 3.4: Initial sync builds all collection indexes as the documents arecopied for each collection. In earlier versions of MongoDB, onlythe _id indexes are built during this stage.

Changed in version 3.4: Initial sync pulls newly added oplog records during the data copy. Ensurethat the target member has enough disk space in the localdatabase to temporarily store these oplog records for theduration of this data copy stage.

  • Applies all changes to the data set. Using the oplog from thesource, the mongod updates its data set to reflect thecurrent state of the replica set.

When the initial sync finishes, the member transitions fromSTARTUP2 to SECONDARY.

To perform an initial sync, seeResync a Member of a Replica Set.

Fault Tolerance

To recover from transient network or operation failures, initial synchas built-in retry logic.

Changed in version 3.4: MongoDB 3.4 improves the initial sync retry logic to be more resilient tointermittent failures on the network.

Replication

Secondary members replicate data continuously after the initial sync.Secondary members copy the oplog fromtheir sync from source and apply these operations in an asynchronousprocess. [1]

Secondaries may automatically change their sync from source as neededbased on changes in the ping time and state of other members’replication.

Changed in version 3.2: MongoDB 3.2 replica set members with 1 vote cannot sync from members with 0 votes.

Secondaries avoid syncing fromdelayed members and hiddenmembers.

If a secondary member has members[n].buildIndexes set to true,it can only sync from other members where buildIndexesis true. Members where buildIndexes is false cansync from any other member, barring other sync restrictions.buildIndexes is true by default.

[1]Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set nowlog oplog entries that take longer than the slowoperation threshold to apply. These slow oplog messages are loggedfor the secondaries in the diagnostic log under the REPL component with the text appliedop: <oplog entry> took <num>ms. These slow oplog entries dependonly on the slow operation threshold. They do not depend on the loglevels (either at the system or component level), or the profilinglevel, or the slow operation sample rate. The profiler does notcapture slow oplog entries.

Multithreaded Replication

MongoDB applies write operations in batches using multiple threads toimprove concurrency. MongoDB groups batches by document id (WiredTiger) and simultaneously applies each group ofoperations using a different thread. MongoDB always applies writeoperations to a given document in their original write order.

While applying a batch, MongoDB blocks all read operations. As aresult, secondary read queries can never return data that reflect astate that never existed on the primary.

Flow Control

Starting in MongoDB 4.2, administrators can limit the rate at whichthe primary applies its writes with the goal of keeping the majoritycommitted lag undera configurable maximum value flowControlTargetLagSeconds.

By default, flow control is enabled.

Note

For flow control to engage, the replica set/sharded cluster musthave: featureCompatibilityVersion (FCV) of4.2 and read concern majority enabled. That is, enabled flowcontrol has no effect if FCV is not 4.2 or if read concernmajority is disabled.

For more information, see Flow Control.