Operations Checklist

Operations Checklist

The following checklist, along with theDevelopment Checklist list, providesrecommendations to help you avoid issues in your production MongoDBdeployment.

Filesystem

Align your disk partitions with your RAID configuration.
Avoid using NFS drives for your dbPath.Using NFS drives can result in degraded and unstable performance.See: Remote Filesystems for more information.
- VMware users should use VMware virtual drives over NFS.
Linux/Unix: format your drives into XFS or EXT4. If possible, useXFS as it generally performs better with MongoDB.
- With the WiredTiger storage engine, use of XFS is stronglyrecommended to avoid performance issues found when using EXT4with WiredTiger.
- If using RAID, you may need to configure XFS with your RAIDgeometry.
Windows: use the NTFS file system.Do not use any FAT file system (i.e. FAT 16/32/exFAT).

Replication

Verify that all non-hidden replica set members are identicallyprovisioned in terms of their RAM, CPU, disk, network setup, etc.
Configure the oplog size tosuit your use case:
- The replication oplog window should cover normal maintenance anddowntime windows to avoid the need for a full resync.
- The replication oplog window should cover the time needed torestore a replica set member from the last backup.

Changed in version 3.4: The replication oplog window no longer needs to cover thetime needed to restore a replica set member via initial syncas the oplog records are pulled during the data copy.However, the member being restored must have enough diskspace in the localdatabase to temporarily store these oplog records for theduration of this data copy stage.

With earlier versions of MongoDB, replication oplog windowshould cover the time needed to restore a replica set memberby initial sync.

Ensure that your replica set includes at least three data-bearingnodes that run with journaling and that you issue writeswith w:"majority" write concern for availability and durability.
Use hostnames when configuring replica set members, rather than IPaddresses.
Ensure full bidirectional network connectivity between allmongod instances.
Ensure that each host can resolve itself.
Ensure that your replica set contains an odd number of voting members.
Ensure that mongod instances have 0 or 1 votes.
For high availability, deploy your replica set into aminimum of three data centers.

Sharding

Place your config servers on dedicated hardware foroptimal performance in large clusters. Ensure that the hardware hasenough RAM to hold the data files entirely in memory and that ithas dedicated storage.
Deploy mongos routers in accordance with theProduction Configuration guidelines.
Use NTP to synchronize the clocks on all components of your shardedcluster.
Ensure full bidirectional network connectivity betweenmongod, mongos, and config servers.
Use CNAMEs to identify your config servers to the cluster so thatyou can rename and renumber your config servers without downtime.

Journaling: WiredTiger Storage Engine

Ensure that all instances use journaling.
Place the journal on its own low-latency disk for write-intensiveworkloads. Note that this will affect snapshot-style backups asthe files constituting the state of the database will reside onseparate volumes.

Hardware

Use RAID10 and SSD drives for optimal performance.
SAN and Virtualization:
- Ensure that each mongod has provisioned IOPS for itsdbPath, or has its own physical drive or LUN.
- Avoid dynamic memory features, such as memory ballooning, whenrunning in virtual environments.
- Avoid placing all replica set members on the same SAN, as the SANcan be a single point of failure.

Deployments to Cloud Hardware

Windows Azure: Adjust the TCP keepalive (tcp_keepalive_time) to100-120. The TCP idle timeout on the Azure load balancer is tooslow for MongoDB’s connection pooling behavior. See:Azure Production Notesfor more information.
Use MongoDB version 2.6.4 or later on systems with high-latencystorage, such as Windows Azure, as these versions includeperformance improvements for those systems.

Operating System Configuration

Linux

Turn off transparent hugepages. SeeTransparent Huge Pages Settings for more information.
Adjust the readahead settings on the devicesstoring your database files.
- For the WiredTiger storage engine, set readahead between 8and 32 regardless of storage media type (spinning disk, SSD,etc.), unless testing shows a measurable, repeatable, andreliable benefit in a higher readahead value.

MongoDB commercial support can provideadvice and guidance on alternate readahead configurations.

Disable the tuned tool if you are running RHEL 7 / CentOS 7 in avirtual environment.

When RHEL 7 / CentOS 7 run in a virtual environment, the tuned toolautomatically invokes a performance profile derived fromperformance throughput, which automatically sets the readaheadsettings to 4MB. This can negatively impact performance.

Use the noop or deadline disk schedulers for SSD drives.
Use the noop disk scheduler for virtualized drives in guest VMs.
Disable NUMA or set vm.zone_reclaim_mode to 0 and run mongodinstances with node interleaving. See: MongoDB and NUMA Hardwarefor more information.
Adjust the ulimit values on your hardware to suit your use case. Ifmultiple mongod or mongos instances arerunning under the same user, scale the ulimit valuesaccordingly. See: UNIX ulimit Settings for more information.
Use noatime for the dbPath mount point.
Configure sufficient file handles (fs.file-max), kernel pidlimit (kernel.pid_max), maximum threads per process(kernel.threads-max), and maximum number of memory map areas perprocess (vm.max_map_count) for your deployment. For large systems,the following values provide a good starting point:
- fs.file-max value of 98000,
- kernel.pid_max value of 64000,
- kernel.threads-max value of 64000, and
- vm.max_map_count value of 128000
Ensure that your system has swap space configured. Refer to youroperating system’s documentation for details on appropriate sizing.
Ensure that the system default TCP keepalive is set correctly. Avalue of 300 often provides better performance for replica sets andsharded clusters. See: Does TCP keepalive time affect MongoDB Deployments? in the Frequently AskedQuestions for more information.

Windows

Consider disabling NTFS “last access time” updates. This isanalogous to disabling atime on Unix-like systems.
Format NTFS disks using the defaultAllocation unit size of 4096 bytes.

Backups

Schedule periodic tests of your back up and restore process to havetime estimates on hand, and to verify its functionality.

Monitoring

Use MongoDB Cloud Manager or Ops Manager, an on-premisesolution available in MongoDB Enterprise Advanced or another monitoring system tomonitor key database metrics and set up alerts for them. Includealerts for the following metrics:
- replication lag
- replication oplog window
- assertions
- queues
- page faults
Monitor hardware statistics for your servers. In particular,pay attention to the disk use, CPU, and available disk space.

In the absence of disk space monitoring, or as a precaution:

Create a dummy 4 GB file on the storage.dbPath driveto ensure available space if the disk becomes full.
A combination of cron+df can alert when disk space hits ahigh-water mark, if no other monitoring tool is available.

Load Balancing

Configure load balancers to enable “sticky sessions” or “clientaffinity”, with a sufficient timeout for existing connections.
Avoid placing load balancers between MongoDB cluster or replica setcomponents.