Data Partitioning with Chunks

MongoDB uses the shard key associated to the collection to partitionthe data into chunks. A chunk consists of a subset ofsharded data. Each chunk has a inclusive lower and exclusive upper range basedon the shard key.

Diagram of the shard key value space segmented into smaller ranges or chunks.

The mongos routes writes to the appropriate chunk based on theshard key value. MongoDB splits chunks when they grow beyond theconfigured chunk size. Both inserts and updatescan trigger a chunk split.

The smallest range a chunk can represent is a single unique shard keyvalue. A chunk that only contains documents with a single shard key valuecannot be split.

Initial Chunks

Populated Collection

  • The sharding operation creates the initial chunk(s) to cover theentire range of the shard key values. The number of chunks createddepends on the configured chunk size.
  • After the initial chunk creation, the balancer migrates these initialchunks across the shards as appropriate as well as manages the chunkdistribution going forward.

Empty Collection

  • If you define zones and zone ranges definedfor an empty or non-existing collection (Available starting inMongoDB 4.0.3):
    • The sharding operation creates empty chunks for the defined zoneranges as well as any additional chunks to cover the entire rangeof the shard key values and performs an initial chunk distributionbased on the zone ranges. This initial creation and distribution ofchunks allows for faster setup of zoned sharding.
    • After the initial distribution, the balancer manages the chunkdistribution going forward.
  • If you do not have zones and zone ranges definedfor an empty or non-existing collection:
    • For hashed sharding:
      • The sharding operation creates empty chunks to cover theentire range of the shard key values and performs an initialchunk distribution. By default, theoperation creates 2 chunks per shard and migrates across thecluster. You can use numInitialChunks option to specify adifferent number of initial chunks. This initial creation anddistribution of chunks allows for faster setup ofsharding.
      • After the initial distribution, the balancer manages the chunkdistribution going forward.
    • For ranged sharding:
      • The sharding operation creates a single empty chunk to cover theentire range of the shard key values.
      • After the initial chunk creation, the balancer migrates theinitial chunk across the shards as appropriate as well as managesthe chunk distribution going forward.

Chunk Size

The default chunk size in MongoDB is 64 megabytes. You canincrease or reduce the chunk size. Consider theimplications of changing the default chunk size:

  • Small chunks lead to a more even distribution of data at theexpense of more frequent migrations. This creates expense at thequery routing (mongos) layer.
  • Large chunks lead to fewer migrations. This is more efficient bothfrom the networking perspective and in terms of internal overhead atthe query routing layer. But, these efficiencies come atthe expense of a potentially uneven distribution of data.
  • Chunk size affects theMaximum Number of Documents Per Chunk to Migrate.
  • Chunk size affects the maximum collection size when sharding anexisting collection.Post-sharding, chunk size does not constrain collection size.For many deployments, it makes sense to avoid frequent and potentiallyspurious migrations at the expense of a slightly less evenlydistributed data set.

Limitations

Changing the chunk size affects when chunks split but there are somelimitations to its effects.

  • Automatic splitting only occurs during inserts or updates. If youlower the chunk size, it may take time for all chunks to split to thenew size.
  • Splits cannot be “undone”. If you increase the chunk size, existingchunks must grow through inserts or updates until they reach the newsize.

Chunk Splits

Splitting is a process that keeps chunks from growing too large. When a chunkgrows beyond a specified chunk size, or if thenumber of documents in the chunk exceeds Maximum Number of DocumentsPer Chunk to Migrate, MongoDB splits the chunk based on the shard key valuesthe chunk represent. A chunk may be split into multiple chunks where necessary.Inserts and updates may trigger splits. Splits are an efficient meta-datachange. To create splits, MongoDB does not migrate any data or affect theshards.

Diagram of a shard with a chunk that exceeds the default chunk size of 64 MB and triggers a split of the chunk into two chunks.

Splits may lead to an uneven distribution of the chunks for acollection across the shards. In such cases, the balancer redistributeschunks across shards. See Cluster Balancer for moredetails on balancing chunks across shards.

Chunk Migration

MongoDB migrates chunks in a sharded cluster to distribute thechunks of a sharded collection evenly among shards. Migrations may beeither:

  • Manual. Only use manual migration in limited cases, such asto distribute data during bulk inserts. See Migrating ChunksManually for more details.
  • Automatic. The balancer processautomatically migrates chunks when there is an uneven distribution ofa sharded collection’s chunks across the shards. See MigrationThresholds for more details.

For more information on the sharded cluster balancer, seeSharded Cluster Balancer.

See also

shardingStatistics.countDonorMoveChunkLockTimeout

Balancing

The balancer is a backgroundprocess that manages chunk migrations. If the difference innumber of chunks between the largest and smallest shard exceed themigration thresholds, the balancerbegins migrating chunks across the cluster to ensure an evendistribution of data.

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

You can manage certainaspects of the balancer. The balancer also respects any zonescreated as a part of configuring zones in a shardedcluster.

See Sharded Cluster Balancer for more information on thebalancer.

Indivisible Chunks

In some cases, chunks can grow beyond the specified chunk size but cannot undergo a split.The most common scenario is when a chunk represents a single shard key value.Since the chunk cannot split, it continues to grow beyond the chunk size,becoming a jumbo chunk. These jumbo chunks can become a performance bottleneckas they continue to grow, especially if the shard key value occurs with highfrequency.

The addition of new data or new shards can result in data distributionimbalances within the cluster. A particular shard may acquiremore chunks than another shard, or the size of a chunk may grow beyond theconfigured maximum chunk size.

MongoDB ensures a balanced cluster using two processes:chunk splitting and the balancer.

moveChunk directory

In MongoDB 2.6 and MongoDB 3.0, sharding.archiveMovedChunks isenabled by default. All other MongoDB versions have this disabled by default. With sharding.archiveMovedChunksenabled, the source shard archives the documents in the migrated chunksin a directory named after the collection namespace under themoveChunk directory in the storage.dbPath.

If some error occurs during a migration, these files may be helpfulin recovering documents affected during the migration.

Once the migration has completed successfully and there is no need torecover documents from these files, you may safely delete these files.Or, if you have an existing backup of the database that you can usefor recovery, you may also delete these files after migration.

To determine if all migrations are complete, runsh.isBalancerRunning() while connected to a mongosinstance.