$lookup (aggregation)

Definition

  • $lookup

New in version 3.2.

Performs a left outer join to an unsharded collection in the _same_database to filter in documents from the “joined” collection forprocessing. To each input document, the $lookup stageadds a new array field whose elements are the matching documentsfrom the “joined” collection. The $lookup stage passesthese reshaped documents to the next stage.

Syntax

The $lookup stage has the following syntaxes:

Equality Match

To perform an equality match between a field from the input documentswith a field from the documents of the “joined” collection, the$lookup stage has the following syntax:

  1. {
  2. $lookup:
  3. {
  4. from: <collection to join>,
  5. localField: <field from the input documents>,
  6. foreignField: <field from the documents of the "from" collection>,
  7. as: <output array field>
  8. }
  9. }

The $lookup takes a document with the following fields:

FieldDescription
fromSpecifies the collection in the same database to performthe join with. The from collection cannot be sharded. Fordetails, see Sharded Collection Restrictions.
localFieldSpecifies the field from the documents input to the$lookup stage. $lookup performs anequality match on the localField to theforeignField from the documents of the fromcollection. If an input document does not contain thelocalField, the $lookup treats thefield as having a value of null for matching purposes.
foreignFieldSpecifies the field from the documents in the fromcollection. $lookup performs an equality match onthe foreignField to the localField from the inputdocuments. If a document in the from collection does notcontain the foreignField, the $lookup treatsthe value as null for matching purposes.
asSpecifies the name of the new array field to add to the inputdocuments. The new array field contains the matchingdocuments from the from collection. If the specified namealready exists in the input document, the existing field isoverwritten.

The operation would correspond to the following pseudo-SQL statement:

  1. SELECT *, <output array field>
  2. FROM collection
  3. WHERE <output array field> IN (SELECT *
  4. FROM <collection to join>
  5. WHERE <foreignField>= <collection.localField>);

See the following examples:

Join Conditions and Uncorrelated Sub-queries

New in version 3.6.

To perform uncorrelated subqueries between two collections as well asallow other join conditions besides a single equality match, the$lookup stage has the following syntax:

  1. {
  2. $lookup:
  3. {
  4. from: <collection to join>,
  5. let: { <var_1>: <expression>, …, <var_n>: <expression> },
  6. pipeline: [ <pipeline to execute on the collection to join> ],
  7. as: <output array field>
  8. }
  9. }

The $lookup takes a document with the following fields:

FieldDescription
fromSpecifies the collection in the same database to perform thejoin with. The from collection cannot be sharded. Fordetails, see Sharded Collection Restrictions.
letOptional. Specifies variables to use in the pipeline field stages. Usethe variable expressions to access the fields from the documentsinput to the $lookup stage.The pipeline cannot directly access the input documentfields. Instead, first define the variables for the inputdocument fields, and then reference the variables in the stagesin the pipeline.To access the let variables in the pipeline, use the$expr operator.NoteThe let variables are accessible by the stages in thepipeline, including additional $lookup stagesnested in the pipeline.
pipelineSpecifies the pipeline to run on the joined collection. Thepipeline determines the resulting documents from the joinedcollection. To return all documents, specify an empty pipeline[].The pipeline cannot include the $out stage orthe $merge stage.The pipeline cannot directly access the input documentfields. Instead, first define the variables for the inputdocument fields, and then reference the variables in the stagesin the pipeline.To access the let variables in the pipeline, use the$expr operator.NoteThe let variables are accessible by the stages in thepipeline, including additional $lookup stagesnested in the pipeline.
asSpecifies the name of the new array field to add to the inputdocuments. The new array field contains the matchingdocuments from the from collection. If the specified namealready exists in the input document, the existing field isoverwritten.

The operation would correspond to the following pseudo-SQL statement:

  1. SELECT *, <output array field>
  2. FROM collection
  3. WHERE <output array field> IN (SELECT <documents as determined from the pipeline>
  4. FROM <collection to join>
  5. WHERE <pipeline> );

See the following examples:

Consideration

Views and Collation

If performing an aggregation that involves multiple views, such aswith $lookup or $graphLookup, the views musthave the same collation.

Restrictions

  1. {
  2. $lookup:
  3. {
  4. from: <collection to join>,
  5. let: { <var_1>: <expression>, …, <var_n>: <expression> },
  6. pipeline: [ <pipeline to execute on the joined collection> ], // Cannot include $out or $merge
  7. as: <output array field>
  8. }
  9. }

Sharded Collection Restrictions

In the $lookup stage, the from collection cannot besharded. However, the collection on which you run theaggregate() method can be sharded. That is, inthe following:

  1. db.collection.aggregate([
  2. { $lookup: { from: "fromCollection", ... } }
  3. ])
  • The collection can be sharded.
  • The fromCollection cannot be sharded.

As such, to join a sharded collection with an unsharded collection, youcan run the aggregation on the sharded collection and lookup theunsharded collection; e.g.:

  1. db.shardedCollection.aggregate([
  2. { $lookup: { from: "unshardedCollection", ... } }
  3. ])

Alternatively, or to join multiple sharded collections, consider:

  • Modifying client applications to perform manual lookups instead ofusing the $lookup aggregation stage.
  • If possible, using an embedded data model that removes the need to join collections.

Examples

Perform a Single Equality Join with $lookup

Create a collection orders with the following documents:

  1. db.orders.insert([
  2. { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
  3. { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
  4. { "_id" : 3 }
  5. ])

Create another collection inventory with the following documents:

  1. db.inventory.insert([
  2. { "_id" : 1, "sku" : "almonds", description: "product 1", "instock" : 120 },
  3. { "_id" : 2, "sku" : "bread", description: "product 2", "instock" : 80 },
  4. { "_id" : 3, "sku" : "cashews", description: "product 3", "instock" : 60 },
  5. { "_id" : 4, "sku" : "pecans", description: "product 4", "instock" : 70 },
  6. { "_id" : 5, "sku": null, description: "Incomplete" },
  7. { "_id" : 6 }
  8. ])

The following aggregation operation on the orders collectionjoins the documents from orders with the documents from theinventory collection using the fields item from theorders collection and the sku field from the inventorycollection:

  1. db.orders.aggregate([
  2. {
  3. $lookup:
  4. {
  5. from: "inventory",
  6. localField: "item",
  7. foreignField: "sku",
  8. as: "inventory_docs"
  9. }
  10. }
  11. ])

The operation returns the following documents:

  1. {
  2. "_id" : 1,
  3. "item" : "almonds",
  4. "price" : 12,
  5. "quantity" : 2,
  6. "inventory_docs" : [
  7. { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
  8. ]
  9. }
  10. {
  11. "_id" : 2,
  12. "item" : "pecans",
  13. "price" : 20,
  14. "quantity" : 1,
  15. "inventory_docs" : [
  16. { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
  17. ]
  18. }
  19. {
  20. "_id" : 3,
  21. "inventory_docs" : [
  22. { "_id" : 5, "sku" : null, "description" : "Incomplete" },
  23. { "_id" : 6 }
  24. ]
  25. }

The operation would correspond to the following pseudo-SQL statement:

  1. SELECT *, inventory_docs
  2. FROM orders
  3. WHERE inventory_docs IN (SELECT *
  4. FROM inventory
  5. WHERE sku= orders.item);

Use $lookup with an Array

Starting MongoDB 3.4, if the localField is an array, you can matchthe array elements against a scalar foreignField without needing an$unwind stage.

For example, create an example collection classes with thefollowing document:

  1. db.classes.insert( [
  2. { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
  3. { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
  4. ])

Create another collection members with the following documents:

  1. db.members.insert( [
  2. { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
  3. { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
  4. { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
  5. { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
  6. { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
  7. { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
  8. ])

The following aggregation operation joins documents in the classescollection with the members collection, matching on the membersfield to the name field:

  1. db.classes.aggregate([
  2. {
  3. $lookup:
  4. {
  5. from: "members",
  6. localField: "enrollmentlist",
  7. foreignField: "name",
  8. as: "enrollee_info"
  9. }
  10. }
  11. ])

The operation returns the following:

  1. {
  2. "_id" : 1,
  3. "title" : "Reading is ...",
  4. "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
  5. "days" : [ "M", "W", "F" ],
  6. "enrollee_info" : [
  7. { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
  8. { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
  9. { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
  10. ]
  11. }
  12. {
  13. "_id" : 2,
  14. "title" : "But Writing ...",
  15. "enrollmentlist" : [ "giraffe1", "artie" ],
  16. "days" : [ "T", "F" ],
  17. "enrollee_info" : [
  18. { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
  19. { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
  20. ]
  21. }

Use $lookup with $mergeObjects

Changed in version 3.6: MongoDB 3.6 adds the $mergeObjects operator to combinemultiple documents into a single document

Create a collection orders with the following documents:

  1. db.orders.insert([
  2. { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
  3. { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
  4. ])

Create another collection items with the following documents:

  1. db.items.insert([
  2. { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 },
  3. { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 },
  4. { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 }
  5. ])

The following operation first uses the $lookup stage tojoin the two collections by the item fields and then uses$mergeObjects in the $replaceRoot to mergethe joined documents from items and orders:

  1. db.orders.aggregate([
  2. {
  3. $lookup: {
  4. from: "items",
  5. localField: "item", // field in the orders collection
  6. foreignField: "item", // field in the items collection
  7. as: "fromItems"
  8. }
  9. },
  10. {
  11. $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
  12. },
  13. { $project: { fromItems: 0 } }
  14. ])

The operation returns the following documents:

  1. { "_id" : 1, "item" : "almonds", "description" : "almond clusters", "instock" : 120, "price" : 12, "quantity" : 2 }
  2. { "_id" : 2, "item" : "pecans", "description" : "candied pecans", "instock" : 60, "price" : 20, "quantity" : 1 }

Specify Multiple Join Conditions with $lookup

Changed in version 3.6: MongoDB 3.6 adds support for executing a pipeline on the joinedcollection, which allows for specifying multiple join conditions aswell as uncorrelated sub-queries.

Create a collection orders with the following documents:

  1. db.orders.insert([
  2. { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  3. { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  4. { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
  5. ])

Create another collection warehouses with the following documents:

  1. db.warehouses.insert([
  2. { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  3. { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  4. { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  5. { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  6. { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
  7. ])

The following operation joins the orders collection with thewarehouse collection by the item and whether the quantity in stockis sufficient to cover the ordered quantity:

  1. db.orders.aggregate([
  2. {
  3. $lookup:
  4. {
  5. from: "warehouses",
  6. let: { order_item: "$item", order_qty: "$ordered" },
  7. pipeline: [
  8. { $match:
  9. { $expr:
  10. { $and:
  11. [
  12. { $eq: [ "$stock_item", "$$order_item" ] },
  13. { $gte: [ "$instock", "$$order_qty" ] }
  14. ]
  15. }
  16. }
  17. },
  18. { $project: { stock_item: 0, _id: 0 } }
  19. ],
  20. as: "stockdata"
  21. }
  22. }
  23. ])

The operation returns the following documents:

  1. { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2,
  2. "stockdata" : [ { "warehouse" : "A", "instock" : 120 }, { "warehouse" : "B", "instock" : 60 } ] }
  3. { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1,
  4. "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }
  5. { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60,
  6. "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }

The operation would correspond to the following pseudo-SQL statement:

  1. SELECT *, stockdata
  2. FROM orders
  3. WHERE stockdata IN (SELECT warehouse, instock
  4. FROM warehouses
  5. WHERE stock_item= orders.item
  6. AND instock >= orders.ordered );

See also

$expr

Uncorrelated Subquery

Changed in version 3.6: MongoDB 3.6 adds support for executing a pipeline on the joinedcollection, which allows for specifying multiple join conditions aswell as uncorrelated sub-queries.

Create a collection absences with the following documents:

  1. db.absences.insert([
  2. { "_id" : 1, "student" : "Ann Aardvark", sickdays: [ new Date ("2018-05-01"),new Date ("2018-08-23") ] },
  3. { "_id" : 2, "student" : "Zoe Zebra", sickdays: [ new Date ("2018-02-01"),new Date ("2018-05-23") ] },
  4. ])

Create another collection holidays with the following documents:

  1. db.holidays.insert([
  2. { "_id" : 1, year: 2018, name: "New Years", date: new Date("2018-01-01") },
  3. { "_id" : 2, year: 2018, name: "Pi Day", date: new Date("2018-03-14") },
  4. { "_id" : 3, year: 2018, name: "Ice Cream Day", date: new Date("2018-07-15") },
  5. { "_id" : 4, year: 2017, name: "New Years", date: new Date("2017-01-01") },
  6. { "_id" : 5, year: 2017, name: "Ice Cream Day", date: new Date("2017-07-16") }
  7. ])

The following operation joins the absences collection with 2018holiday information from the holidays collection:

  1. db.absences.aggregate([
  2. {
  3. $lookup:
  4. {
  5. from: "holidays",
  6. pipeline: [
  7. { $match: { year: 2018 } },
  8. { $project: { _id: 0, date: { name: "$name", date: "$date" } } },
  9. { $replaceRoot: { newRoot: "$date" } }
  10. ],
  11. as: "holidays"
  12. }
  13. }
  14. ])

The operation returns the following:

  1. { "_id" : 1, "student" : "Ann Aardvark", "sickdays" : [ ISODate("2018-05-01T00:00:00Z"), ISODate("2018-08-23T00:00:00Z") ],
  2. "holidays" : [ { "name" : "New Years", "date" : ISODate("2018-01-01T00:00:00Z") }, { "name" : "Pi Day", "date" : ISODate("2018-03-14T00:00:00Z") }, { "name" : "Ice Cream Day", "date" : ISODate("2018-07-15T00:00:00Z") } ] }
  3. { "_id" : 2, "student" : "Zoe Zebra", "sickdays" : [ ISODate("2018-02-01T00:00:00Z"), ISODate("2018-05-23T00:00:00Z") ],
  4. "holidays" : [ { "name" : "New Years", "date" : ISODate("2018-01-01T00:00:00Z") }, { "name" : "Pi Day", "date" : ISODate("2018-03-14T00:00:00Z") }, { "name" : "Ice Cream Day", "date" : ISODate("2018-07-15T00:00:00Z") } ] }

The operation would correspond to the following pseudo-SQL statement:

  1. SELECT *, holidays
  2. FROM absences
  3. WHERE holidays IN (SELECT name, date
  4. FROM holidays
  5. WHERE year = 2018);