Index Builds on Populated Collections

Changed in version MongoDB: 4.2

MongoDB index builds against a populated collection require an exclusiveread-write lock against the collection. Operations that require a reador write lock on the collection must wait until themongod releases the lock. MongoDB 4.2 uses an optimizedbuild process that only holds the exclusive lock at the beginning andend of the index build. The rest of the build process yields tointerleaving read and write operations.

The build process is summarized as follows:

  • Initialization

The mongod takes an exclusive lock against thecollection being indexed. This blocks all read and write operationsto the collection until the mongod releases the lock.Applications cannot access the collection duringthis time.

  • Data Ingestion and Processing

The mongod releases all locks taken by the index buildprocess before taking a series of intent locks against the collectionbeing indexed. Applications can issue read and write operationsagainst the collection during this time.

  • Cleanup

The mongod releases all locks taken by the index buildprocess before taking an exclusive lock against the the collectionbeing indexed. This blocks all read and write operations to thecollection until the mongod releases the lock.Applications cannot access the collection during this time.

  • Completion

The mongod marks the index as ready to useand releases all locks taken by the index build process.

For a detailed description of index build locking behavior, seeIndex Build Process. For more information on MongoDB lockingbehavior, see FAQ: Concurrency.

Behavior

MongoDB 4.2 index builds fully replace the index build processessupported in previous MongoDB versions. MongoDB ignores thebackground index build option if specified tocreateIndexes or its shell helperscreateIndex() andcreateIndexes().

Requires featureCompatibilityVersion 4.2

For MongoDB clusters upgraded from 4.0 to 4.2, you must set thefeature compatibility version (fcv) to 4.2to enable the optimized build process. For more information onsetting the fCV, see setFeatureCompatibilityVersion.

MongoDB 4.2 clusters running with fCV 4.0 only support 4.0 indexbuilds.

Comparison to Foreground and Background Builds

Previous versions of MongoDB supported building indexes either in theforeground or background. Foreground index builds were fast and producedmore efficient index data structures, but required blocking allread-write access to the parent database of the collection being indexedfor the duration of the build. Background index builds were slower andhad less efficient results, but allowed read-write access to thedatabase and its collections during the build process.

Changed in version MongoDB: 4.2

MongoDB 4.2 index builds obtain an exclusive lock on only the collectionbeing indexed during the start and end of the build process to protectmetadata changes. The rest of the build process uses the yieldingbehavior of background index builds to maximize read-write access to thecollection during the build. 4.2 index builds still produce efficientindex data structures despite the more permissive locking behavior.

MongoDB 4.2 index build performance is at least on par with backgroundindex builds. For workloads with few or no updates received during thebuild process, 4.2 index builds builds can be as fast as a foregroundindex build on that same data.

Use db.currentOp() to monitor the progress of ongoing indexbuilds.

Constraint Violations During Index Build

For indexes that enforce constraints on the collection, such asunique indexes, the mongodchecks all pre-existing and concurrently-written documents forviolations of those constraints after the index build completes.Documents that violate the index constraints can exist during the indexbuild. If any documents violate the index constraints at the end of thebuild, the mongod terminates the build and throws anerror.

For example, consider a populated collection inventory. Anadministrator wants to create a unique index on the product_skufield. If any documents in the collection have duplicate values forproduct_sku, the index build can still start successfully.If any violations still exist at the end of the build,the mongod terminates the build and throws an error.

Similarly, an application can successfully write documents to theinventory collection with duplicate values of product_sku whilethe index build is in progress. If any violations still exist at the endof the build, the mongod terminates the build and throwsan error.

To mitigate the risk of index build failure due to constraintviolations:

  • Validate that no documents in the collection violate the indexconstraints.
  • Stop all writes to the collection from applications that cannotguarantee violation-free write operations.

Index Build Impact on Database Performance

  • Index Builds During Write-Heavy Workloads
  • Building indexes during time periods where the target collectionis under heavy write load can result in reduced writeperformance and longer index builds.

Consider designating a maintenance window during which applicationsstop or reduce write operations against the collection. Start theindex build during this maintenance window to mitigate thepotential negative impact of the build process.

  • Insufficient Available System Memory (RAM)
  • createIndexes supports building one or more indexes on acollection. createIndexes uses a combination of memory andtemporary files on disk to complete index builds. The default limit onmemory usage for createIndexes is 500 megabytes, sharedbetween all indexes built using a single createIndexescommand. Once the memory limit is reached, createIndexesuses temporary disk files in a subdirectory named _tmp within the—dbpath directory to complete the build.

You can override the memory limit by setting themaxIndexBuildMemoryUsageMegabytes server parameter. Settinga higher memory limit may result in faster completion of index buildslarger than 500 megabytes. However, setting this limit too high relativeto the unused RAM on your system can result in memory errors.

If the host machine has limited available free RAM, you may needto schedule a maintenance period to increase the total system RAMbefore you can modify the mongod RAM usage.

Index Builds in Replicated Environments

To minimize the impact of building an index on:

You can alternatively start the index build on the primary. Oncethe index build completes, the secondaries replicate and start the indexbuild. Consider the following risks before starting a replicated indexbuild:

  • Secondaries May Fall Out of Sync
  • Secondary index builds block the application of replicatedtransactions on a sharded cluster if that transaction includeswrites to the collection being indexed. Similarly, replicated metadataoperations against the collection being indexed also stall behind theindex build. The mongod cannot apply any furtheroplog entries until the index build completes.

Replicated write operation to the collection being indexed can alsostall behind the index build if the index build is holding anexclusive lock at the time of the operation or command. Themongod cannot apply any further oplog entries until theindex build releases the exclusive lock. If replication stalls forlonger than the oplog window on thatsecondary, the secondary falls out of sync and requiresresynchronization to recover.

Use rs.printReplicationInfo() on each replica set member tovalidate the time covered by the oplog size configured for thatmember prior to starting the index build. You canincrease the oplog sizeto mitigate the likelihood of a secondary falling out of sync.For example, setting an oplog window size that can cover72 hours of operations ensures that secondaries can tolerate at leastthat much replication lag.

Alternatively, build indexes during a maintenance window in whichapplications cease issuing distributed transactions, write operations,or metadata commands that affect the collection being indexed.

  • Secondary Index Builds May Stall Read and Write Operations
  • MongoDB 4.2 index builds obtain an exclusive lock on the collectionbeing indexed at the start and end of the build process. Whilea secondary index build holds the exclusive lock, any read orwrite operations that depends on the secondary stall until thebuild releases that lock.
  • Secondaries Process Index Drops After Index Build Completes
  • Dropping the index on the primary before secondariescomplete the replicated index build does not kill the secondary indexbuilds. When the secondary replicates the index drop, it must waituntil after the index build completes to apply the drop.Furthermore, since index drops are a metadata operation on thecollection, the index drop stalls replication on that secondary.

Build Failure and Recovery

Interrupted Index Builds on Standalone mongod

If the mongod shuts down during the index build, theindex build job and all progress is lost. Restarting themongod does not restart the index build. You mustre-issue the createIndex() operation to restartthe index build.

Interrupted Index Builds on a Primary mongod

If the primary shuts down or steps down during the index build, theindex build job and all progress is lost. Restarting themongod does not restart the index build. You mustre-issue the createIndex() operation torestart the index build.

Interrupted Index Builds on a Secondary mongod

If a secondary shuts down during the index build, the index build job ispersisted. Restarting the mongod recovers the index buildand restarts it from scratch.

The startup process stalls behind any recovered index builds. All otheroperations, including replication, wait until the index builds complete.If the secondary’s oplog does not cover the time required to completethe index build, the secondary may fall out of sync with the rest of thereplica set and requireresynchronization.

If you restart the mongod as a standalone(i.e. removing or commenting out replication.replSetNameor omitting —replSetName), themongod still recovers the index build from scratch. You can usethe storage.indexBuildRetry configuration file setting or—noIndexBuildRetry command lineoption to skip the index build on start up.

MongoDB 4.0+

You cannot specify storage.indexBuildRetry or—noIndexBuildRetry for amongod that is part of a replica set.

Rollbacks during Build Process

Starting in version 4.0, MongoDB waits for any in-progress index buildsto finish before starting arollback.

Monitor In Progress Index Builds

To see the status of an index build operation, you can use thedb.currentOp() method in the mongo shell. Tofilter the current operations for index creation operations, seeActive Indexing Operations for an example.

The msg field includes a percentage-completemeasurement of the current stage in the index build process.

Terminate In Progress Index Builds

To terminate an ongoing index build on a primary or standalonemongod, use the db.killOp() method in themongo shell. When terminating an index build, the effectsof db.killOp() may not be immediate and may occur well aftermuch of the index build operation has completed.

You cannot terminate a replicated index build on secondary members ofa replica set. You must first dropthe index on the primary. The secondaries will replicate the dropoperation and drop the indexes after the index build completes.All further replication blocks behind the index build and drop.

To minimize the impact of building an index on replicasets and sharded clusters with replica set shards, see:

Index Build Process

The following table describes each stage of the index buildprocess:

StageDescription
LockThe mongod obtains an exclusive X lock on thethe collection being indexed. This blocks all read and writeoperations on the collection, including the applicationof any replicated write operations or metadata commands thattarget the collection. The mongod does not yieldthis lock.
InitializationThe mongod creates three data structuresat this initial state:- The initial index metadata entry.- A temporary table (“side writes table”) that stores keysgenerated from writes to the collection being indexedduring the build process.- A temporary table (“constraint violation table”) for alldocuments that may cause a duplicate-key constraint violation.
LockThe mongod downgrades the exclusive Xcollection lock to an intent exclusiveIX lock. The mongod periodically yieldsthis lock to interleaving read and write operations.
Scan CollectionFor each document in the collection, the mongodgenerates a key for that document and dumps thekey into an external sorter.If the mongod encounters a duplicate keyerror while generating a key during the collection scan,it stores that key in the constraint violation table for laterprocessing.If the mongod encounters any other error whilegenerating a key, the build fails with an error.Once the mongod completes the collectionscan, it dumps the sorted keys into the index.
Process Side Writes TableThe mongod drains the side write table usingfirst-in-first-out priority.If the mongod encounters a duplicate keyerror while processing a key in the side write table, itstores that key in the constraint violation table for laterprocessing.If the mongod encounters any other error whileprocessing a key, the build fails with an error.For each document written to the collection during the buildprocess, the mongod generates a key for thatdocument and stores it in the side write table for laterprocessing. The mongod uses a snapshot system to seta limit to the number of keys to process.
LockThe mongod upgrades the intent exclusive IXlock on the collection to a shared S lock. Thisblocks all write operations to the collection, including theapplication of any replicated write operations or metadatacommands that target the collection.
Finish Processing Temporary Side Writes TableThe mongod continues draining remainingrecords in the side writes table. The mongod maypause replication during this stage.If the mongod encounters a duplicate keyerror while processing a key in the side write table, itstores that key in the constraint violation table for laterprocessing.If the mongod encounters any other error whileprocessing a key, the build fails with an error.
LockThe mongod upgrades the shared S lock on thecollection to an exclusive X lock on the collection. Thisblocks all read and write operations on the collection, includingthe application of any replicated write operations or metadatacommands that target the collection. The mongoddoes not yield this lock.
Drop Side Write TableThe mongod applies any remainingoperations in the side writes table before dropping it.If the mongod encounters a duplicate keyerror while processing a key in the side write table, itstores that key in the constraint violation table for laterprocessing.If the mongod encounters any other error whileprocessing a key, the build fails with an error.At this point, the index includes all data written tothe collection.
Process Constraint Violation TableThe mongod drains the constraint violation tableusing first-in-first-out priority. The mongodthen drops the table.If any key in the constraint violation table still produces aduplicate key error, the mongod aborts the buildand throws an error.The mongod drops the constraint violation tableonce it is drained or if it encounters a duplicate key violationduring processing.
Mark the Index as ReadyThe mongod updates the index metadata tomark the index as ready for use.
LockThe mongod releases the X lock on thecollection.