Sharded Cluster Balancer

The MongoDB balancer is a background process that monitors the number ofchunks on each shard. When the number of chunks on agiven shard reaches specific migration thresholds, the balancer attempts to automaticallymigrate chunks between shards and reach an equal number of chunks per shard.

The balancing procedure for sharded clusters isentirely transparent to the user and application layer, though there may besome performance impact while the procedure takes place.

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

Starting in MongoDB 3.4, the balancer runs on the primary of the configserver replica set (CSRS):

  • In version 3.4, when a balancer process is active, the primary of theconfig server replica set acquires a “balancer lock” by modifying a_id: "balancer" document in the locks collectionin the Config Database. This “balancer lock” is neverreleased.
  • Starting in version 3.6, the balancer no longer takes a “lock”.

Cluster Balancer

The balancer process is responsible for redistributing thechunks of a sharded collection evenly among the shards for everysharded collection. By default, the balancer process is always enabled.

To address uneven chunk distribution for a sharded collection, thebalancer migrates chunks fromshards with more chunks to shards with a fewer number of chunks. Thebalancer migrates the chunks until there is an evendistribution of chunks for the collection across the shards. For detailsabout chunk migration, see Chunk Migration Procedure.

Changed in version 2.6: Chunk migrations can have an impact on disk space. Starting inMongoDB 2.6, the source shard automatically archives the migrateddocuments by default. For details, see moveChunk directory.

Chunk migrations carry some overhead in terms of bandwidth andworkload, both of which can impact database performance. [1] Thebalancer attempts to minimize the impact by:

  • Restricting a shard to at most one migration at any given time; i.e.a shard cannot participate in multiple chunk migrations at the sametime. To migrate multiple chunks from a shard, the balancer migratesthe chunks one at a time.

Changed in version 3.4: Starting in MongoDB 3.4, MongoDB can perform parallel chunkmigrations. Observing the restriction that a shard can participatein at most one migration at a time, for a sharded cluster with n_shards, MongoDB can perform at most _n/2 (rounded down)simultaneous chunk migrations.

See also Asynchronous Chunk Migration Cleanup.

  • Starting a balancing round only when the difference in thenumber of chunks between the shard with the greatest number of chunksfor a sharded collection and the shard with the lowest number ofchunks for that collection reaches the migration threshold.

You may disable the balancer temporarily for maintenance. SeeDisable the Balancer for details.

You can also limit the window during which the balancer runs to preventit from impacting production traffic. See Schedule the BalancingWindow for details.

Note

The specification of the balancing window is relative to the localtime zone of the primary of the config server replica set.

See also

Manage Sharded Cluster Balancer.

[1]Starting in MongoDB 4.0.3, the shard collection operation canperform an initial chunk creation and distribution for empty ornon-existing collections if zones and zone ranges have been defined for the collection. Initialcreation and distribution of chunk allows for faster setup of zonedsharding. After the initial distribution, the balancer manages thechunk distribution going forward per usual.

Adding and Removing Shards from the Cluster

Adding a shard to a cluster creates an imbalance, since the newshard has no chunks. While MongoDB begins migrating data to the newshard immediately, it can take some time before the cluster balances. See theAdd Shards to a Cluster tutorial for instructions onadding a shard to a cluster.

Removing a shard from a cluster creates a similar imbalance, since chunksresiding on that shard must be redistributed throughout the cluster. WhileMongoDB begins draining a removed shard immediately, it can take some timebefore the cluster balances. Do not shutdown the servers associatedto the removed shard during this process.

When you remove a shard in a cluster with an uneven chunkdistribution, the balancer first removes the chunks from the drainingshard and then balances the remaining uneven chunk distribution.

See the Remove Shards from an Existing Sharded Cluster tutorial forinstructions on safely removing a shard from a cluster.

Chunk Migration Procedure

All chunk migrations use the following procedure:

  • The balancer process sends the moveChunk command tothe source shard.

  • The source starts the move with an internal moveChunkcommand. During the migration process, operations to the chunkroute to the source shard. The source shard is responsible forincoming write operations for the chunk.

  • The destination shard builds any indexes required by the sourcethat do not exist on the destination.

  • The destination shard begins requesting documents in the chunk andstarts receiving copies of the data. See alsoChunk Migration and Replication.

  • After receiving the final document in the chunk, thedestination shard starts a synchronization process to ensure that ithas the changes to the migrated documents that occurred during themigration.

  • When fully synchronized, the source shard connects to theconfig database and updates the cluster metadata with the newlocation for the chunk.

  • After the source shard completes the update of the metadata,and once there are no open cursors on the chunk, the source sharddeletes its copy of the documents.

Note

If the balancer needs to perform additional chunk migrations fromthe source shard, the balancer can start the next chunk migrationwithout waiting for the current migration process to finish thisdeletion step. See Asynchronous Chunk Migration Cleanup.

See also

moveChunk directory.

The migration process ensures consistency and maximizes the availability ofchunks during balancing.

See also

shardingStatistics.countDonorMoveChunkLockTimeout

Migration Thresholds

To minimize the impact of balancing on the cluster, thebalancer only begins balancing after the distribution ofchunks for a sharded collection has reached certain thresholds. Thethresholds apply to the difference in number of chunksbetween the shard with the most chunks for the collection and the shardwith the fewest chunks for that collection. The balancer has thefollowing thresholds:

Number of ChunksMigration Threshold
Fewer than 202
20-794
80 and greater8

The balancer stops running on the target collection when the differencebetween the number of chunks on any two shards for that collection is lessthan two, or a chunk migration fails.

Asynchronous Chunk Migration Cleanup

To migrate multiple chunks from a shard, the balancer migrates thechunks one at a time. However, the balancer does not wait for thecurrent migration’s delete phase to complete before starting the nextchunk migration. See Chunk Migration for the chunkmigration process and the delete phase.

This queuing behavior allows shards to unload chunks more quickly incases of heavily imbalanced cluster, such as when performing initialdata loads without pre-splitting and when adding new shards.

This behavior also affects the moveChunk command, andmigration scripts that use the moveChunk command mayproceed more quickly.

In some cases, the delete phases may persist longer. If multiple deletephases are queued but not yet complete, a crash of the replica set’sprimary can orphan data from multiple migrations.

The _waitForDelete, available as a setting for the balancer as wellas the moveChunk command, can alter the behavior so thatthe delete phase of the current migration blocks the start of the nextchunk migration. The _waitForDelete is generally for internaltesting purposes. For more information, seeWait for Delete.

Chunk Migration and Replication

Changed in version 3.4.

During chunk migration, the _secondaryThrottle value determineswhen the migration proceeds with next document in the chunk.

In the config.settings collection:

  • If the _secondaryThrottle setting for the balancer is set to awrite concern, each document move during chunk migration must receivethe requested acknowledgement before proceeding with the nextdocument.
  • If the _secondaryThrottle setting for the balancer is set totrue, each document move during chunk migration must receiveacknowledgement from at least one secondary before the migrationproceeds with the next document in the chunk. This is equivalent to awrite concern of { w: 2 }.
  • If the _secondaryThrottle setting is unset, the migration processdoes not wait for replication to a secondary and instead continueswith the next document.

To update the _secondaryThrottle parameter for the balancer, seeSecondary Throttle for an example.

Independent of any _secondaryThrottle setting, certain phases ofthe chunk migration have the following replication policy:

  • MongoDB briefly pauses all application reads and writes to thecollection being migrated, on the source shard, before updating theconfig servers with the new location for the chunk, and resumes theapplication reads and writes after the update. The chunk move requiresall writes to be acknowledged by majority of the members of thereplica set both before and after committing the chunk move to configservers.
  • When an outgoing chunk migration finishes and cleanup occurs, allwrites must be replicated to a majority of servers before furthercleanup (from other outgoing migrations) or new incoming migrationscan proceed.

To update the _secondaryThrottle setting in theconfig.settings collection, seeSecondary Throttle for an example.

Maximum Number of Documents Per Chunk to Migrate

Changed in version 3.4.11.

MongoDB cannot move a chunk if the number of documents in the chunk is greater than1.3 times the result of dividing the configuredchunk size by the average document size.db.collection.stats() includes the avgObjSize field,which represents the average document size in the collection.

Shard Size

By default, MongoDB attempts to fill all available disk space withdata on every shard as the data set grows. To ensure that the clusteralways has the capacity to handle data growth, monitor diskusage as well as other performance metrics.

See the Change the Maximum Storage Size for a Given Shard tutorial forinstructions on setting the maximum size for a shard.