Aggregation Pipeline

On this page

The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results.

Pipeline - 图1

The aggregation pipeline provides an alternative tomap-reduceand may be the preferred solution for aggregation tasks where the complexity of map-reduce may be unwarranted.

Aggregation pipeline have some limitations on value types and result size. SeeAggregation Pipeline Limitsfor details on limits and restrictions on the aggregation pipeline.

Pipeline

The MongoDB aggregation pipeline consists ofstages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents. Pipeline stages can appear multiple times in the pipeline.

MongoDB provides thedb.collection.aggregate()method in themongoshell and theaggregatecommand for aggregation pipeline. SeeStage Operatorsfor the available stages.

For example usage of the aggregation pipeline, considerAggregation with User Preference DataandAggregation with the Zip Code Data Set.

Pipeline Expressions

Some pipeline stages take a pipeline expression as the operand. Pipeline expressions specify the transformation to apply to the input documents. Expressions have adocumentstructure and can contain otherexpression.

Pipeline expressions can only operate on the current document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.

Generally, expressions are stateless and are only evaluated when seen by the aggregation process with one exception:accumulatorexpressions.

The accumulators, used in the$groupstage, maintain their state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.

Changed in version 3.2:Some accumulators are available in the$projectstage; however, when used in the$projectstage, the accumulators do not maintain their state across documents.

For more information on expressions, seeExpressions.

Aggregation Pipeline Behavior

In MongoDB, theaggregatecommand operates on a single collection, logically passing the_entire_collection into the aggregation pipeline. To optimize the operation, wherever possible, use the following strategies to avoid scanning the entire collection.

Pipeline Operators and Indexes

The$matchand$sortpipeline operators can take advantage of an index when they occur at thebeginningof the pipeline.

New in version 2.4:The$geoNearpipeline operator takes advantage of a geospatial index. When using$geoNear, the$geoNearpipeline operation must appear as the first stage in an aggregation pipeline.

Changed in version 3.2:Starting in MongoDB 3.2, indexes cancoveran aggregation pipeline. In MongoDB 2.6 and 3.0, indexes could not cover an aggregation pipeline since even when the pipeline uses an index, aggregation still requires access to the actual documents.

Early Filtering

If your aggregation operation requires only a subset of the data in a collection, use the$match,$limit, and$skipstages to restrict the documents that enter at the beginning of the pipeline. When placed at the beginning of a pipeline,$matchoperations use suitable indexes to scan only the matching documents in a collection.

Placing a$matchpipeline stage followed by a$sortstage at the start of the pipeline is logically equivalent to a single query with a sort and can use an index. When possible, place$matchoperators at the beginning of the pipeline.

Additional Features

The aggregation pipeline has an internal optimization phase that provides improved performance for certain sequences of operators. For details, seeAggregation Pipeline Optimization.

The aggregation pipeline supports operations on sharded collections. SeeAggregation Pipeline and Sharded Collections.

Additional Resources