Monitoring for MongoDB

Monitoring is a critical component of all database administration. Afirm grasp of MongoDB’s reporting will allow you to assess the stateof your database and maintain your deployment without crisis.Additionally, a sense of MongoDB’s normal operational parameters willallow you to diagnose problems before they escalate to failures.

This document presents an overview of the available monitoring utilitiesand the reporting statisticsavailable in MongoDB. It also introduces diagnostic strategiesand suggestions for monitoring replica sets andsharded clusters.

Monitoring Strategies

MongoDB provides various methods for collecting data about the state ofa running MongoDB instance:

  • Starting in version 4.0, MongoDB offers free Cloud monitoring for standalones and replica sets.
  • MongoDB distributes a set of utilities that provides real-timereporting of database activities.
  • MongoDB provides various database commands that return statistics regarding the currentdatabase state with greater fidelity.
  • MongoDB Atlasis a cloud-hosted database-as-a-service for running, monitoring, andmaintaining MongoDB deployments.
  • MongoDB Cloud Manager is a hosted service that monitors running MongoDBdeployments to collect data and provide visualization and alertsbased on that data.
  • MongoDB Ops Manager is an on-premise solution available inMongoDB Enterprise Advancedthat monitors running MongoDB deployments to collect data and providevisualization and alerts based on that data.

Each strategy can help answer different questions and is useful indifferent contexts. These methods are complementary.

MongoDB Reporting Tools

This section provides an overview of the reporting methods distributedwith MongoDB. It also offers examples of the kinds of questions thateach method is best suited to help you address.

Free Monitoring

New in version 4.0.

MongoDB offers free Cloud monitoring for standalones or replica sets.

By default, you can enable/disable free monitoring during runtime usingdb.enableFreeMonitoring() and db.disableFreeMonitoring().

Free monitoring provides up to 24 hours of data. For more details, seeFree Monitoring.

Utilities

The MongoDB distribution includes a number of utilities that quicklyreturn statistics about instances’ performance and activity. Typically,these are most useful for diagnosing issues and assessing normaloperation.

mongostat

mongostat captures and returns the counts of databaseoperations by type (e.g. insert, query, update, delete, etc.). Thesecounts report on the load distribution on the server.

Use mongostat to understand the distribution of operation typesand to inform capacity planning. See the mongostat manual for details.

mongotop

mongotop tracks and reports the current read and writeactivity of a MongoDB instance, and reports these statistics on a percollection basis.

Use mongotop to check if your database activity and usematch your expectations. See the mongotop manual for details.

HTTP Console

Changed in version 3.6: MongoDB 3.6 removes the deprecated HTTP interface and REST API toMongoDB.

Commands

MongoDB includes a number of commands that report on the state of thedatabase.

These data may provide a finer level of granularity than the utilitiesdiscussed above. Consider using their output in scripts and programs todevelop custom alerts, or to modify the behavior of your application inresponse to the activity of your instance. The db.currentOpmethod is another useful tool for identifying the database instance’sin-progress operations.

serverStatus

The serverStatus command, or db.serverStatus()from the shell, returns a general overview of the status of thedatabase, detailing disk usage, memory use, connection, journaling,and index access. The command returns quickly and does not impactMongoDB performance.

serverStatus outputs an account of the state of a MongoDBinstance. This command is rarely run directly. In most cases, the datais more meaningful when aggregated, as one would see with monitoringtools including MongoDB Cloud Manager and Ops Manager. Nevertheless, alladministrators should be familiar with the data provided byserverStatus.

dbStats

The dbStats command, or db.stats() from the shell,returns a document that addresses storage use and data volumes. ThedbStats reflect the amount ofstorage used, the quantity of data contained in the database, andobject, collection, and index counters.

Use this data to monitor the state and storage capacityof a specific database. This output also allows you to compareuse between databases and to determine the averagedocument size in a database.

collStats

The collStats or db.collection.stats() from theshell that provides statistics that resemble dbStats onthe collection level, including a count of the objects in thecollection, the size of the collection, the amount of disk space usedby the collection, and information about its indexes.

replSetGetStatus

The replSetGetStatus command (rs.status() fromthe shell) returns an overview of your replica set’s status. The replSetGetStatus document details thestate and configuration of the replica set and statistics about its members.

Use this data to ensure that replication is properly configured,and to check the connections between the current host and the other membersof the replica set.

Hosted (SaaS) Monitoring Tools

These are monitoring tools provided as a hosted service, usually througha paid subscription.

NameNotes
MongoDB Cloud ManagerMongoDB Cloud Manager is a cloud-based suite of services for managing MongoDBdeployments. MongoDB Cloud Manager provides monitoring, backup, and automationfunctionality. For an on-premise solution, see alsoOps Manager, available in MongoDB Enterprise Advanced.
VividCortexVividCortex provides deep insights into MongoDB productiondatabase workload and query performance – inone-second resolution. Track latency, throughput, errors, andmore to ensure scalability and exceptional performance of yourapplication on MongoDB.
ScoutSeveral plugins, including MongoDB Monitoring,MongoDB Slow Queries,and MongoDB Replica Set Monitoring.
Server DensityDashboard for MongoDB, MongoDBspecific alerts, replication failover timeline and iPhone, iPadand Android mobile apps.
Application Performance ManagementIBM has an Application Performance Management SaaS offering thatincludes monitor for MongoDB and other applications and middleware.
New RelicNew Relic offers full support for application performancemanagement. In addition, New Relic Plugins and Insights enable you to viewmonitoring metrics from Cloud Manager in New Relic.
DatadogInfrastructure monitoring to visualizethe performance of your MongoDB deployments.
SPM Performance MonitoringMonitoring, Anomaly Detection and Alerting SPM monitors all key MongoDB metrics together with infrastructure incl. Docker and other application metrics, e.g. Node.js, Java, NGINX, Apache, HAProxy or Elasticsearch. SPM provides correlation of metrics and logs.

Process Logging

During normal operation, mongod and mongosinstances report a live account of all server activity and operationsto eitherstandard output or a log file. The following runtime settingscontrol these options.

  • quiet. Limits the amount of information written to thelog or output.
  • verbosity. Increases the amount of information written tothe log or output. You can also modify the logging verbosity duringruntime with the logLevel parameter or thedb.setLogLevel() method in the shell.
  • path. Enables logging to a file, rather than the standardoutput. You must specify the full path to the log file when adjustingthis setting.
  • logAppend. Adds information to a logfile instead of overwriting the file.

Note

You can specify these configuration operations as the command linearguments to mongod or mongos

For example:

  1. mongod -v --logpath /var/log/mongodb/server1.log --logappend

Starts a mongod instance in verbose mode, appending data to the log file at/var/log/mongodb/server1.log/.

The following database commands alsoaffect logging:

Log Redaction

New in version 3.4: Available in MongoDB Enterprise only

A mongod running with security.redactClientLogDataredacts messages associated with any givenlog event before logging, leaving only metadata, source files, or line numbersrelated to the event. security.redactClientLogData preventspotentially sensitive information from entering the system log at the cost ofdiagnostic detail.

For example, the following operation inserts a document into amongod running without log redaction. The mongodhas systemLog.component.command.verbosity set to 1:

  1. db.clients.insertOne( { "name" : "Joe", "PII" : "Sensitive Information" } )

This operation produces the following log event:

  1. 2017-06-09T13:35:23.446-0400 I COMMAND [conn1] command internal.clients
  2. appName: "MongoDB Shell"
  3. command: insert {
  4. insert: "clients",
  5. documents: [ {
  6. _id: ObjectId('593adc5b99001b7d119d0c97'),
  7. name: "Joe",
  8. PII: " Sensitive Information"
  9. } ],
  10. ordered: true
  11. }
  12. ...

A mongod running with security.redactClientLogDataperforming the same insert operation produces the following log event:

  1. 2017-06-09T13:45:18.599-0400 I COMMAND [conn1] command internal.clients
  2. appName: "MongoDB Shell"
  3. command: insert {
  4. insert: "###", documents: [ {
  5. _id: "###", name: "###", PII: "###"
  6. } ],
  7. ordered: "###"
  8. }

Use redactClientLogData in conjunction withEncryption at Rest andTLS/SSL (Transport Encryption) to assist compliance withregulatory requirements.

Diagnosing Performance Issues

As you develop and operate applications with MongoDB, you may want toanalyze the performance of the database as the application.MongoDB Performance discusses some of theoperational factors that can influence performance.

Replication and Monitoring

Beyond the basic monitoring requirements for any MongoDB instance, forreplica sets, administrators must monitor replicationlag. “Replication lag” refers to the amount of time that it takes tocopy (i.e. replicate) a write operation on the primary to asecondary. Some small delay period may be acceptable, butsignificant problems emerge as replication lag grows, including:

  • Growing cache pressure on the primary.

  • Operations that occurred during the period of lag are notreplicated to one or more secondaries. If you’re using replicationto ensure data persistence, exceptionally long delays may impact theintegrity of your data set.

  • If the replication lag exceeds the length of the operationlog (oplog) then MongoDB will have to perform an initialsync on the secondary, copying all data from the primary andrebuilding all indexes. [1] This is uncommon under normal circumstances,but if you configure the oplog to be smaller than the default,the issue can arise.

Note

The size of the oplog is only configurable during the firstrun using the —oplogSize argument tothe mongod command, or preferably, theoplogSizeMB settingin the MongoDB configuration file. If you do not specify this on thecommand line before running with the —replSetoption, mongod will create a default sized oplog.

By default, the oplog is 5 percent of total available disk spaceon 64-bit systems. For more information about changing the oplogsize, see the Change the Size of the Oplog.

Flow Control

Starting in MongoDB 4.2, administrators can limit the rate at whichthe primary applies its writes with the goal of keeping the majoritycommitted lag undera configurable maximum value flowControlTargetLagSeconds.

By default, flow control is enabled.

Note

For flow control to engage, the replica set/sharded cluster musthave: featureCompatibilityVersion (FCV) of4.2 and read concern majority enabled. That is, enabled flowcontrol has no effect if FCV is not 4.2 or if read concernmajority is disabled.

See also: Check the Replication Lag.

Replica Set Status

Replication issues are most often the result of network connectivityissues between members, or the result of a primary that does nothave the resources to support application and replication traffic. Tocheck the status of a replica, use the replSetGetStatus orthe following helper in the shell:

  1. rs.status()

The replSetGetStatus reference provides a more in-depthoverview view of this output. In general, watch the value ofoptimeDate, and pay particular attentionto the time difference between the primary and thesecondary members.

[1]Starting in MongoDB 4.0, the oplog can grow past its configured sizelimit to avoid deleting the majority commit point.

Free Monitoring

Note

Starting in version 4.0, MongoDB offers free monitoring for standalone and replica sets.For more information, see Free Monitoring.

Slow Application of Oplog Entries

Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set nowlog oplog entries that take longer than the slowoperation threshold to apply. These slow oplog messages are loggedfor the secondaries in the diagnostic log under the REPL component with the text appliedop: <oplog entry> took <num>ms. These slow oplog entries dependonly on the slow operation threshold. They do not depend on the loglevels (either at the system or component level), or the profilinglevel, or the slow operation sample rate. The profiler does notcapture slow oplog entries.

Sharding and Monitoring

In most cases, the components of sharded clustersbenefit from the same monitoring and analysis as all other MongoDBinstances. In addition, clusters require further monitoring to ensurethat data is effectively distributed among nodes and that shardingoperations are functioning appropriately.

See also

See the Sharding documentation for moreinformation.

Config Servers

The config database maintains a map identifying whichdocuments are on which shards. The cluster updates this map aschunks move between shards. When a configurationserver becomes inaccessible, certain sharding operations becomeunavailable, such as moving chunks and starting mongosinstances. However, clusters remain accessible from already-runningmongos instances.

Because inaccessible configuration servers can seriously impactthe availability of a sharded cluster, you should monitor yourconfiguration servers to ensure that the cluster remains wellbalanced and that mongos instances can restart.

MongoDB Cloud Manager and Ops Manager monitor config servers and cancreate notifications if a config server becomes inaccessible. See theMongoDB Cloud Manager documentation and Ops Manager documentation for more information.

Balancing and Chunk Distribution

The most effective sharded cluster deployments evenly balancechunks among the shards. To facilitate this, MongoDBhas a background balancer process that distributes data to ensure thatchunks are always optimally distributed among the shards.

Issue the db.printShardingStatus() or sh.status()command to the mongos by way of the mongoshell. This returns an overview of the entire cluster including thedatabase name, and a list of the chunks.

Stale Locks

To check the lock status of the database, connect to amongos instance using the mongo shell. Issue thefollowing command sequence to switch to the config database anddisplay all outstanding locks on the shard database:

  1. use config
  2. db.locks.find()

The balancing process takes a special “balancer” lock that preventsother balancing activity from transpiring. In the config database,use the following command to view the “balancer” lock.

  1. db.locks.find( { _id : "balancer" } )

Changed in version 3.4: Starting in 3.4, the primary of the CSRS config server holds the“balancer” lock, using a process id named “ConfigServer”. This lockis never released. To determine if the balancer is running, seeCheck if Balancer is Running.

Storage Node Watchdog

Note

  • Starting in MongoDB 4.2, the Storage Node Watchdog is available in both the Community andMongoDB Enterprise editions.
  • In earlier versions (3.2.16+, 3.4.7+, 3.6.0+, 4.0.0+), theStorage Node Watchdog is onlyavailable in MongoDB Enterprise edition.

The Storage Node Watchdog monitors the following MongoDB directories todetect filesystem unresponsiveness:

By default, the Storage Node Watchdog is disabled. You can only enablethe Storage Node Watchdog on a mongod at startup time bysetting the watchdogPeriodSeconds parameter to an integergreater than or equal to 60. However, once enabled, you can pause theStorage Node Watchdog and restart during runtime. SeewatchdogPeriodSeconds parameter for details.

If any of the filesystems containing the monitored directories becomeunresponsive, the Storage Node Watchdog terminates themongod and exits with a status code of 61. If themongod is the primary of a replica set, thetermination initiates a failover, allowing another member tobecome primary.

Once a mongod has terminated, it may not be possible to cleanlyrestart it on the same machine.

The maximum time the Storage Node Watchdog cantake to detect an unresponsive filesystem and terminate is nearly twice thevalue of watchdogPeriodSeconds.