Aggregation

Aggregation operations process data records and return computedresults. Aggregation operations group values from multiple documentstogether, and can perform a variety of operations on the grouped datato return a single result. MongoDB provides three ways to performaggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

Aggregation Pipeline

MongoDB’s aggregation framework is modeled on the concept of dataprocessing pipelines. Documents enter a multi-stage pipeline thattransforms the documents into an aggregated result. For example:

In the example,

  1. db.orders.aggregate([
  2. { $match: { status: "A" } },
  3. { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
  4. ])

First Stage: The $match stage filters the documents bythe status field and passes to the next stage those documents thathave status equal to "A".

Second Stage: The $group stage groups the documents bythe cust_id field to calculate the sum of the amount for eachunique cust_id.

The most basic pipeline stages provide filters that operate likequeries and document transformations that modify the formof the output document.

Other pipeline operations provide tools for grouping and sortingdocuments by specific field or fields as well as tools for aggregatingthe contents of arrays, including arrays of documents. In addition,pipeline stages can use operators for tasks such as calculating theaverage or concatenating a string.

The pipeline provides efficient data aggregation using nativeoperations within MongoDB, and is the preferred method for dataaggregation in MongoDB.

The aggregation pipeline can operate on asharded collection.

The aggregation pipeline can use indexes to improve its performanceduring some of its stages. In addition, the aggregation pipeline has aninternal optimization phase. SeePipeline Operators and Indexes andAggregation Pipeline Optimization for details.

Map-Reduce

MongoDB also provides map-reduce operationsto perform aggregation. In general, map-reduce operations have twophases: a map stage that processes each document and emits one ormore objects for each input document, and reduce phase that combinesthe output of the map operation. Optionally, map-reduce can have afinalize stage to make final modifications to the result. Like otheraggregation operations, map-reduce can specify a query condition toselect the input documents as well as sort and limit the results.

Map-reduce uses custom JavaScript functions to perform the map andreduce operations, as well as the optional finalize operation. Whilethe custom JavaScript provide great flexibility compared to theaggregation pipeline, in general, map-reduce is less efficient and morecomplex than the aggregation pipeline.

Map-reduce can operate on asharded collection. Map-reduce operationscan also output to a sharded collection. SeeMap-Reduce and Sharded Collections for details.

Note

Starting in MongoDB 2.4, certain mongo shellfunctions and properties are inaccessible in map-reduceoperations. MongoDB 2.4 also provides support for multipleJavaScript operations to run at the same time. Before MongoDB 2.4,JavaScript code executed in a single thread, raising concurrencyissues for map-reduce.

Diagram of the annotated map-reduce operation.

Single Purpose Aggregation Operations

MongoDB also provides db.collection.estimatedDocumentCount(),db.collection.count() and db.collection.distinct().

All of these operations aggregate documents from a single collection.While these operations provide simple access to common aggregationprocesses, they lack the flexibility and capabilities of theaggregation pipeline and map-reduce.

Diagram of the annotated distinct operation.

Additional Features and Behaviors

For a feature comparison of the aggregation pipeline,map-reduce, and the special group functionality, seeAggregation Commands Comparison.