Aggregation Pipeline Optimization

Aggregation Pipeline Optimization

Aggregation pipeline operations have an optimization phase whichattempts to reshape the pipeline for improved performance.

To see how the optimizer transforms a particular aggregation pipeline,include the explain option in thedb.collection.aggregate() method.

Optimizations are subject to change between releases.

Projection Optimization

The aggregation pipeline can determine if it requires only a subset ofthe fields in the documents to obtain the results. If so, the pipelinewill only use those required fields, reducing the amount of datapassing through the pipeline.

Pipeline Sequence Optimization

($project or $unset or $addFields or $set) + $match Sequence Optimization

For an aggregation pipeline that contains a projection stage($project or $unset or$addFields or $set) followed by a$match stage, MongoDB moves any filters in the$match stage that do not require values computed in theprojection stage to a new $match stage before theprojection.

If an aggregation pipeline contains multiple projection and/or$match stages, MongoDB performs this optimization for each$match stage, moving each $match filter beforeall projection stages that the filter does not depend on.

Consider a pipeline of the following stages:

{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
    name: "Joe Schmoe",
    maxTime: { $lt: 20 },
    minTime: { $gt: 5 },
    avgTime: { $gt: 7 }
} }

The optimizer breaks up the $match stage into fourindividual filters, one for each key in the $match querydocument. The optimizer then moves each filter before as many projectionstages as possible, creating new $match stages as needed.Given this example, the optimizer produces the following _optimized_pipeline:

{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }

The $match filter { avgTime: { $gt: 7 } } depends on the$project stage to compute the avgTime field. The$project stage is the last projection stage in thispipeline, so the $match filter on avgTime could not bemoved.

The maxTime and minTime fields are computed in the$addFields stage but have no dependency on the$project stage. The optimizer created a new$match stage for the filters on these fields and placed itbefore the $project stage.

The $match filter { name: "Joe Schmoe" } does notuse any values computed in either the $project or$addFields stages so it was moved to a new$match stage before both of the projection stages.

Note

After optimization, the filter { name: "Joe Schmoe" } is ina $match stage at the beginning of the pipeline. This hasthe added benefit of allowing the aggregation to use an index on thename field when initially querying the collection.See Pipeline Operators and Indexes for moreinformation.

$sort + $match Sequence Optimization

When you have a sequence with $sort followed by a$match, the $match moves before the$sort to minimize the number of objects to sort. Forexample, if the pipeline consists of the following stages:

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

During the optimization phase, the optimizer transforms the sequence tothe following:

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

$redact + $match Sequence Optimization

When possible, when the pipeline has the $redact stageimmediately followed by the $match stage, the aggregationcan sometimes add a portion of the $match stage before the$redact stage. If the added $match stage is atthe start of a pipeline, the aggregation can use an index as well asquery the collection to limit the number of documents that enter thepipeline. See Pipeline Operators and Indexes formore information.

For example, if the pipeline consists of the following stages:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

The optimizer can add the same $match stage before the$redact stage:

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

$project/$unset + $skip Sequence Optimization

New in version 3.2.

When you have a sequence with $project or $unset followed by$skip, the $skipmoves before $project. For example, ifthe pipeline consists of the following stages:

{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }

During the optimization phase, the optimizer transforms the sequence tothe following:

{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }

Pipeline Coalescence Optimization

When possible, the optimization phase coalesces a pipeline stage intoits predecessor. Generally, coalescence occurs after any sequencereordering optimization.

$sort + $limit Coalescence

Changed in version 4.0.

When a $sort precedes a $limit, the optimizercan coalesce the $limit into the $sort if nointervening stages modify the number of documents(e.g. $unwind, $group).MongoDB will not coalesce the $limit into the$sort if there are pipeline stages that change the number ofdocuments between the $sort and $limit stages..

For example, if the pipeline consists of the following stages:

{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

During the optimization phase, the optimizer coalesces the sequenceto the following:

{
    "$sort" : {
       "sortKey" : {
          "age" : -1
       },
       "limit" : NumberLong(5)
    }
},
{ "$project" : {
         "age" : 1,
         "status" : 1,
         "name" : 1
  }
}

This allows the sort operation to only maintain thetop n results as it progresses, where n is the specified limit,and MongoDB only needs to store n items in memory[1]. See $sort Operator and Memory for moreinformation.

Sequence Optimization with $skip

If there is a $skip stage between the $sortand $limit stages, MongoDB will coalesce the$limit into the $sort stage and increase the$limit value by the $skip amount. See$sort + $skip + $limit Sequence for an example.

[1]	The optimization will still apply when`allowDiskUse` is `true` and the `n` items exceed theaggregation memory limit.

$limit + $limit Coalescence

When a $limit immediately follows another$limit, the two stages can coalesce into a single$limit where the limit amount is the smaller of the twoinitial limit amounts. For example, a pipeline contains the followingsequence:

{ $limit: 100 },
{ $limit: 10 }

Then the second $limit stage can coalesce into the first$limit stage and result in a single $limitstage where the limit amount 10 is the minimum of the two initiallimits 100 and 10.

{ $limit: 10 }

$skip + $skip Coalescence

When a $skip immediately follows another $skip,the two stages can coalesce into a single $skip where theskip amount is the sum of the two initial skip amounts. For example, apipeline contains the following sequence:

{ $skip: 5 },
{ $skip: 2 }

Then the second $skip stage can coalesce into the first$skip stage and result in a single $skipstage where the skip amount 7 is the sum of the two initiallimits 5 and 2.

{ $skip: 7 }

$match + $match Coalescence

When a $match immediately follows another$match, the two stages can coalesce into a single$match combining the conditions with an$and. For example, a pipeline contains the followingsequence:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first$match stage and result in a single $matchstage

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

$lookup + $unwind Coalescence

New in version 3.2.

When a $unwind immediately follows another$lookup, and the $unwind operates on the asfield of the $lookup, the optimizer can coalesce the$unwind into the $lookup stage. This avoidscreating large intermediate documents.

For example, a pipeline contains the following sequence:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y"
  }
},
{ $unwind: "$resultingArray"}

The optimizer can coalesce the $unwind stage into the$lookup stage. If you run the aggregation with explainoption, the explain output shows the coalesced stage:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y",
    unwinding: { preserveNullAndEmptyArrays: false }
  }
}

Example

$sort + $skip + $limit Sequence

A pipeline contains a sequence of $sort followed by a$skip followed by a $limit:

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

The optimizer performs $sort + $limit Coalescence totransforms the sequence to the following:

{
   "$sort" : {
      "sortKey" : {
         "age" : -1
      },
      "limit" : NumberLong(15)
   }
},
{
   "$skip" : NumberLong(10)
}

MongoDB increases the $limit amount with the reordering.