Aggregation Pipeline and Sharded Collections

The aggregation pipeline supports operations on sharded collections. This section describes behaviorsspecific to the aggregation pipeline andsharded collections.

Behavior

Changed in version 3.2.

If the pipeline starts with an exact $match on a shard key,the entire pipeline runs on the matching shard only. Previously, thepipeline would have been split, and the work of merging it would haveto be done on the primary shard.

For aggregation operations that must run on multiple shards, if theoperations do not require running on the database’s primary shard,these operations will route the results to a random shard to merge theresults to avoid overloading the primary shard for that database. The$out stage and the $lookup stage requirerunning on the database’s primary shard.

Optimization

When splitting the aggregation pipeline into two parts, the pipeline issplit to ensure that the shards perform as many stages as possible withconsideration for optimization.

To see how the pipeline was split, include the explain option in thedb.collection.aggregate() method.

Optimizations are subject to change between releases.