$bucket (aggregation)

Definition

  • $bucket

New in version 3.4.

Categorizes incoming documents into groups, called buckets, based ona specified expression and bucket boundaries.

Each bucket is represented as a document in the output. The documentfor each bucket contains an _id field, whose value specifies theinclusive lower bound of the bucket and a count field thatcontains the number of documents in the bucket. The count fieldis included by default when the output is not specified.

$bucket only produces output documents for buckets thatcontain at least one input document.

  1. {
  2. $bucket: {
  3. groupBy: <expression>,
  4. boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
  5. default: <literal>,
  6. output: {
  7. <output1>: { <$accumulator expression> },
  8. ...
  9. <outputN>: { <$accumulator expression> }
  10. }
  11. }
  12. }

FieldTypeDescriptiongroupByexpressionAn expression to group documentsby. To specify a field path,prefix the field name with a dollar sign $ and enclose it inquotes.

Unless $bucket includes a defaultspecification, each input document must resolve thegroupBy field path or expression to a value that fallswithin one of the ranges specified by the boundaries.boundariesarrayAn array of values based on the groupBy expression that specifythe boundaries for each bucket. Each adjacent pair of values actsas the inclusive lower boundary and the exclusive upper boundaryfor the bucket. You must specify at least two boundaries.

The specified values must be in ascending order and all of the sametype. The exception is if the valuesare of mixed numeric types, such as:

[ 10, NumberLong(20), NumberInt(30) ]

Example

An array of [ 0, 5, 10 ] creates two buckets:

  • [0, 5) with inclusive lower bound 0 and exclusive upperbound 5.
  • [5, 10) with inclusive lower bound 5 and exclusive upperbound 10.defaultliteralOptional. A literal that specifies the _id of an additional bucket thatcontains all documents whose groupBy expression result does notfall into a bucket specified by boundaries.

If unspecified, each input document must resolve thegroupBy expression to a value within one of the bucket rangesspecified by boundaries or the operation throws an error.

The default value must be less than the lowest boundariesvalue, or greater than or equal to the highest boundaries value.

The default value can be of a differenttype than the entries inboundaries.outputdocumentOptional. A document that specifies the fields to include in the outputdocuments in addition to the _id field. To specify the field toinclude, you must useaccumulator expressions.

  1. <outputfield1>: { <accumulator>: <expression1> },
  2. ...
  3. <outputfieldN>: { <accumulator>: <expressionN> }

The default count field is not included in the output documentwhen output is specified. Explicitly specify the countexpression as part of the output document to include it:

  1. output: {
  2. <outputField1>: { <accumulator>: <expression1> },
  3. ...
  4. "count": { $sum: 1 }
  5. }

Behavior

$bucket requires at least one of the following conditions to be metor the operation throws an error:

  • Each input document resolves the groupBy expression to avalue within one of the bucket ranges specified by boundaries, or
  • A default value is specified to bucket documents whose groupByvalues are outside of the boundaries or of a differentBSON type than the values inboundaries.

If the groupBy expression resolves to an array or a document,$bucket arranges the input documents into buckets using thecomparison logic from $sort.

Example

Consider a collection artwork with the following documents:

  1. { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
  2. "price" : NumberDecimal("199.99") }
  3. { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
  4. "price" : NumberDecimal("280.00") }
  5. { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
  6. "price" : NumberDecimal("76.04") }
  7. { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
  8. "price" : NumberDecimal("167.30") }
  9. { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
  10. "price" : NumberDecimal("483.00") }
  11. { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
  12. "price" : NumberDecimal("385.00") }
  13. { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
  14. /* No price*/ }
  15. { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
  16. "price" : NumberDecimal("118.42") }

The following operation uses the $bucket aggregation stage toarrange the artwork collection into buckets according to price:

  1. db.artwork.aggregate( [
  2. {
  3. $bucket: {
  4. groupBy: "$price",
  5. boundaries: [ 0, 200, 400 ],
  6. default: "Other",
  7. output: {
  8. "count": { $sum: 1 },
  9. "titles" : { $push: "$title" }
  10. }
  11. }
  12. }
  13. ] )

The buckets have the following boundaries:

  • [0, 200) with inclusive lowerbound 0 and exclusive upper bound200.
  • [200, 400) with inclusive lowerbound 200 and exclusive upperbound 400.
  • “Other” is the default bucket for documents without prices orprices outside the ranges above.

The operation returns the following documents:

  1. {
  2. "_id" : 0,
  3. "count" : 4,
  4. "titles" : [
  5. "The Pillars of Society",
  6. "Dancer",
  7. "The Great Wave off Kanagawa",
  8. "Blue Flower"
  9. ]
  10. }
  11. {
  12. "_id" : 200,
  13. "count" : 2,
  14. "titles" : [
  15. "Melancholy III",
  16. "Composition VII"
  17. ]
  18. }
  19. {
  20. "_id" : "Other",
  21. "count" : 2,
  22. "titles" : [
  23. "The Persistence of Memory",
  24. "The Scream"
  25. ]
  26. }

Using $bucket with $facet

The $bucket stage can be used within the $facetstage to process multiple aggregation pipelines on artwork in asingle aggregation stage.

The following operation groups the documents from artwork intobuckets based on price and year:

  1. db.artwork.aggregate( [
  2. {
  3. $facet: {
  4. "price": [
  5. {
  6. $bucket: {
  7. groupBy: "$price",
  8. boundaries: [ 0, 200, 400 ],
  9. default: "Other",
  10. output: {
  11. "count": { $sum: 1 },
  12. "artwork" : { $push: { "title": "$title", "price": "$price" } }
  13. }
  14. }
  15. }
  16. ],
  17. "year": [
  18. {
  19. $bucket: {
  20. groupBy: "$year",
  21. boundaries: [ 1890, 1910, 1920, 1940 ],
  22. default: "Unknown",
  23. output: {
  24. "count": { $sum: 1 },
  25. "artwork": { $push: { "title": "$title", "year": "$year" } }
  26. }
  27. }
  28. }
  29. ]
  30. }
  31. }
  32. ] )

The first facet groups the input documents by price. Thebuckets have the following boundaries:

  • [0, 200) with inclusive lowerbound 0 and exclusive upper bound200.
  • [200, 400) with inclusive lowerbound 200 and exclusive upperbound 400.
  • “Other”, thedefault bucket containing documents without prices orprices outside the ranges above.

The second facet groups the input documents by year. The bucketshave the following boundaries:

  • [1890, 1910) with inclusive lowerbound 1890 and exclusive upperbound 1910.
  • [1910, 1920) with inclusive lowerbound 1910 and exclusive upperbound 1920.
  • [1920, 1940) with inclusive lowerbound 1910 and exclusive upperbound 1940.
  • “Unknown”, thedefault bucket containing documents without years oryears outside the ranges above.

The operation returns the following document:

  1. {
  2. "year" : [
  3. {
  4. "_id" : 1890,
  5. "count" : 2,
  6. "artwork" : [
  7. {
  8. "title" : "Melancholy III",
  9. "year" : 1902
  10. },
  11. {
  12. "title" : "The Scream",
  13. "year" : 1893
  14. }
  15. ]
  16. },
  17. {
  18. "_id" : 1910,
  19. "count" : 2,
  20. "artwork" : [
  21. {
  22. "title" : "Composition VII",
  23. "year" : 1913
  24. },
  25. {
  26. "title" : "Blue Flower",
  27. "year" : 1918
  28. }
  29. ]
  30. },
  31. {
  32. "_id" : 1920,
  33. "count" : 3,
  34. "artwork" : [
  35. {
  36. "title" : "The Pillars of Society",
  37. "year" : 1926
  38. },
  39. {
  40. "title" : "Dancer",
  41. "year" : 1925
  42. },
  43. {
  44. "title" : "The Persistence of Memory",
  45. "year" : 1931
  46. }
  47. ]
  48. },
  49. {
  50. // Includes the document without a year, e.g., _id: 4
  51. "_id" : "Unknown",
  52. "count" : 1,
  53. "artwork" : [
  54. {
  55. "title" : "The Great Wave off Kanagawa"
  56. }
  57. ]
  58. }
  59. ],
  60. "price" : [
  61. {
  62. "_id" : 0,
  63. "count" : 4,
  64. "artwork" : [
  65. {
  66. "title" : "The Pillars of Society",
  67. "price" : NumberDecimal("199.99")
  68. },
  69. {
  70. "title" : "Dancer",
  71. "price" : NumberDecimal("76.04")
  72. },
  73. {
  74. "title" : "The Great Wave off Kanagawa",
  75. "price" : NumberDecimal("167.30")
  76. },
  77. {
  78. "title" : "Blue Flower",
  79. "price" : NumberDecimal("118.42")
  80. }
  81. ]
  82. },
  83. {
  84. "_id" : 200,
  85. "count" : 2,
  86. "artwork" : [
  87. {
  88. "title" : "Melancholy III",
  89. "price" : NumberDecimal("280.00")
  90. },
  91. {
  92. "title" : "Composition VII",
  93. "price" : NumberDecimal("385.00")
  94. }
  95. ]
  96. },
  97. {
  98. // Includes the document without a price, e.g., _id: 7
  99. "_id" : "Other",
  100. "count" : 2,
  101. "artwork" : [
  102. {
  103. "title" : "The Persistence of Memory",
  104. "price" : NumberDecimal("483.00")
  105. },
  106. {
  107. "title" : "The Scream"
  108. }
  109. ]
  110. }
  111. ]
  112. }

See also $bucketAuto