FAQ: MongoDB Storage

This document addresses common questions regarding MongoDB’s storagesystem.

Storage Engine Fundamentals

What is a storage engine?

A storage engine is the part of a database that is responsible formanaging how data is stored, both in memory and on disk. Many databasessupport multiple storage engines, where different engines perform betterfor specific workloads. For example, one storage engine might offerbetter performance for read-heavy workloads, and another might support ahigher throughput for write operations.

See also

Storage Engines

Can you mix storage engines in a replica set?

Yes. You can have replica set members that use different storageengines (WiredTiger and in-memory)

Note

Starting in version 4.2, MongoDB removes the deprecated MMAPv1 storageengine.

WiredTiger Storage Engine

Can I upgrade an existing deployment to WiredTiger?

Yes. See:

How much compression does WiredTiger provide?

The ratio of compressed data to uncompressed data depends on your dataand the compression library used. By default, collection data inWiredTiger use Snappy block compression; zliband zstd compression is also available. Index data useprefix compression by default.

To what size should I set the WiredTiger internal cache?

With WiredTiger, MongoDB utilizes both the WiredTiger internal cacheand the filesystem cache.

Starting in MongoDB 3.4, the default WiredTiger internal cache size isthe larger of either:

  • 50% of (RAM - 1 GB), or
  • 256 MB.

For example, on a system with a total of 4GB of RAM the WiredTigercache will use 1.5GB of RAM (0.5 (4 GB - 1 GB) = 1.5 GB).Conversely, a system with a total of 1.25 GB of RAM will allocate 256MB to the WiredTiger cache because that is more than half of thetotal RAM minus one gigabyte (0.5 (1.25 GB - 1 GB) = 128 MB < 256 MB).

Note

In some instances, such as when running in a container, the databasecan have memory constraints that are lower than the total systemmemory. In such instances, this memory limit, rather than the totalsystem memory, is used as the maximum RAM available.

To see the memory limit, see hostInfo.system.memLimitMB.

By default, WiredTiger uses Snappy block compression for all collectionsand prefix compression for all indexes. Compression defaults are configurableat a global level and can also be set on a per-collection and per-indexbasis during collection and index creation.

Different representations are used for data in the WiredTiger internal cacheversus the on-disk format:

  • Data in the filesystem cache is the same as the on-disk format, includingbenefits of any compression for data files. The filesystem cache is usedby the operating system to reduce disk I/O.
  • Indexes loaded in the WiredTiger internal cache have a different datarepresentation to the on-disk format, but can still take advantage ofindex prefix compression to reduce RAM usage. Index prefix compressiondeduplicates common prefixes from indexed fields.
  • Collection data in the WiredTiger internal cache is uncompressedand uses a different representation from the on-disk format. Blockcompression can provide significant on-disk storage savings, butdata must be uncompressed to be manipulated by the server.

Via the filesystem cache, MongoDB automatically uses all free memorythat is not used by the WiredTiger cache or by other processes.

To adjust the size of the WiredTiger internal cache, seestorage.wiredTiger.engineConfig.cacheSizeGB and—wiredTigerCacheSizeGB. Avoid increasing the WiredTigerinternal cache size above its default value.

Note

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internalcache. The operating system will use the available free memoryfor filesystem cache, which allows the compressed MongoDB datafiles to stay in memory. In addition, the operating system willuse any free RAM to buffer file system blocks and file systemcache.

To accommodate the additional consumers of RAM, you may have todecrease WiredTiger internal cache size.

The default WiredTiger internal cache size value assumes that there is asingle mongod instance per machine. If a single machinecontains multiple MongoDB instances, then you should decrease the setting toaccommodate the other mongodinstances.

If you run mongod in a container (e.g. lxc,cgroups, Docker, etc.) that does not have access to all of theRAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB to a valueless than the amount of RAM available in the container. The exactamount depends on the other processes running in the container. SeememLimitMB.

To view statistics on the cache and eviction rate, see thewiredTiger.cache fieldreturned from the serverStatus command.

How frequently does WiredTiger write to disk?

  • Checkpoints
  • Starting in version 3.6, MongoDB configures WiredTiger to createcheckpoints (i.e. write the snapshot data to disk) at intervalsof 60 seconds. In earlier versions, MongoDB sets checkpoints tooccur in WiredTiger on user data at an interval of 60 seconds orwhen 2 GB of journal data has been written, whichever occursfirst.
  • Journal Data
  • WiredTiger syncs the buffered journal records to disk upon any ofthe following conditions:

    • For replica set members (primary and secondary members),

      • If there are operations waiting for oplog entries. Operationsthat can wait for oplog entries include:
      • Additionally for secondary members, after every batchapplication of the oplog entries.
    • If a write operation includes or implies a write concern ofj: true.

Note

Write concern "majority" implies j: true ifthe writeConcernMajorityJournalDefault is true.

  • At every 100 milliseconds (See storage.journal.commitIntervalMs).

  • When WiredTiger creates a new journal file. Because MongoDB uses ajournal file size limit of 100 MB, WiredTiger creates a newjournal file approximately every 100 MB of data.

How do I reclaim disk space in WiredTiger?

The WiredTiger storage engine maintains lists of empty records in datafiles as it deletes documents. This space can be reused by WiredTiger,but will not be returned to the operating system unless under veryspecific circumstances.

The amount of empty space available for reuse by WiredTiger is reflectedin the output of db.collection.stats() under the headingwiredTiger.block-manager.file bytes available for reuse.

To allow the WiredTiger storage engine to release this empty space to theoperating system, you can de-fragment your data file. This can be achievedusing the compact command. For more information on its behaviorand other considerations, see compact.

Data Storage Diagnostics

How can I check the size of a collection?

To view the statistics for a collection, including the data size, usethe db.collection.stats() method from the mongoshell. The following example issues db.collection.stats() forthe orders collection:

  1. db.orders.stats();

MongoDB also provides the following methods to return specific sizesfor the collection:

The following script prints the statistics for each database:

  1. db.adminCommand("listDatabases").databases.forEach(function (d) {
  2. mdb = db.getSiblingDB(d.name);
  3. printjson(mdb.stats());
  4. })

The following script prints the statistics for each collection in eachdatabase:

  1. db.adminCommand("listDatabases").databases.forEach(function (d) {
  2. mdb = db.getSiblingDB(d.name);
  3. mdb.getCollectionNames().forEach(function(c) {
  4. s = mdb[c].stats();
  5. printjson(s);
  6. })
  7. })

How can I check the size of the individual indexes for a collection?

To view the size of the data allocated for each index, use thedb.collection.stats() method and check theindexSizes field in the returned document.

If an index uses prefix compression (which is the default forWiredTiger), the returnedsize for that index reflects the compressed size.

How can I get information on the storage use of a database?

The db.stats() method in the mongo shell returnsthe current state of the “active” database. For the description of thereturned fields, see dbStats Output.