Aggregation with User Preference Data

Data Model

Consider a hypothetical sports club with a database that contains ausers collection that tracks the user’s join dates, sport preferences,and stores these data in documents that resemble the following:

  1. {
  2. _id : "jane",
  3. joined : ISODate("2011-03-02"),
  4. likes : ["golf", "racquetball"]
  5. }
  6. {
  7. _id : "joe",
  8. joined : ISODate("2012-07-02"),
  9. likes : ["tennis", "golf", "swimming"]
  10. }

Normalize and Sort Documents

The following operation returns user names in upper case and inalphabetical order. The aggregation includes user names for all documents inthe users collection. You might do this to normalize user names forprocessing.

  1. db.users.aggregate(
  2. [
  3. { $project : { name:{$toUpper:"$_id"} , _id:0 } },
  4. { $sort : { name : 1 } }
  5. ]
  6. )

All documents from the users collection pass through thepipeline, which consists of the following operations:

  • The $project operator:
    • creates a new field called name.
    • converts the value of the _id to upper case, with the$toUpper operator. Then the$project creates a new field, named nameto hold this value.
    • suppresses the id field. $project will passthe _id field by default, unless explicitly suppressed.
  • The $sort operator orders the results by thename field.

The results of the aggregation would resemble the following:

  1. {
  2. "name" : "JANE"
  3. },
  4. {
  5. "name" : "JILL"
  6. },
  7. {
  8. "name" : "JOE"
  9. }

Return Usernames Ordered by Join Month

The following aggregation operation returns user names sorted by themonth they joined. This kind of aggregation could help generatemembership renewal notices.

  1. db.users.aggregate(
  2. [
  3. { $project :
  4. {
  5. month_joined : { $month : "$joined" },
  6. name : "$_id",
  7. _id : 0
  8. }
  9. },
  10. { $sort : { month_joined : 1 } }
  11. ]
  12. )

The pipeline passes all documents in the users collection throughthe following operations:

  • The $project operator:
    • Creates two new fields: month_joined and name.
    • Suppresses the id from the results. The aggregate() method includes the _id, unlessexplicitly suppressed.
  • The $month operator converts the values of thejoined field to integer representations of the month. Then the$project operator assigns those values to themonth_joined field.
  • The $sort operator sorts the results by themonth_joined field.

The operation returns results that resemble the following:

  1. {
  2. "month_joined" : 1,
  3. "name" : "ruth"
  4. },
  5. {
  6. "month_joined" : 1,
  7. "name" : "harold"
  8. },
  9. {
  10. "month_joined" : 1,
  11. "name" : "kate"
  12. }
  13. {
  14. "month_joined" : 2,
  15. "name" : "jill"
  16. }

Return Total Number of Joins per Month

The following operation shows how many people joined each month of theyear. You might use this aggregated data for recruiting and marketingstrategies.

  1. db.users.aggregate(
  2. [
  3. { $project : { month_joined : { $month : "$joined" } } } ,
  4. { $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
  5. { $sort : { "_id.month_joined" : 1 } }
  6. ]
  7. )

The pipeline passes all documents in the users collection throughthe following operations:

  • The $project operator creates a new field calledmonth_joined.
  • The $month operator converts the values of thejoined field to integer representations of the month. Then the$project operator assigns the values to themonth_joined field.
  • The $group operator collects all documents with agiven month_joined value and counts how many documents there arefor that value. Specifically, for each unique value,$group creates a new “per-month” document with twofields:
    • _id, which contains a nested document with themonth_joined field and its value.
    • number, which is a generated field. The $sumoperator increments this field by 1 for every document containingthe given month_joined value.
  • The $sort operator sorts the documents created by$group according to the contents of themonth_joined field.

The result of this aggregation operation would resemble the following:

  1. {
  2. "_id" : {
  3. "month_joined" : 1
  4. },
  5. "number" : 3
  6. },
  7. {
  8. "_id" : {
  9. "month_joined" : 2
  10. },
  11. "number" : 9
  12. },
  13. {
  14. "_id" : {
  15. "month_joined" : 3
  16. },
  17. "number" : 5
  18. }

Return the Five Most Common “Likes”

The following aggregation collects top five most “liked” activities inthe data set. This type of analysis could help inform planning andfuture development.

  1. db.users.aggregate(
  2. [
  3. { $unwind : "$likes" },
  4. { $group : { _id : "$likes" , number : { $sum : 1 } } },
  5. { $sort : { number : -1 } },
  6. { $limit : 5 }
  7. ]
  8. )

The pipeline begins with all documents in the users collection,and passes these documents through the following operations:

  • The $unwind operator separates each value in thelikes array, and creates a new version of the source documentfor every element in the array.

Example

Given the following document from the users collection:

  1. {
  2. _id : "jane",
  3. joined : ISODate("2011-03-02"),
  4. likes : ["golf", "racquetball"]
  5. }

The $unwind operator would create the followingdocuments:

  1. {
  2. _id : "jane",
  3. joined : ISODate("2011-03-02"),
  4. likes : "golf"
  5. }
  6. {
  7. _id : "jane",
  8. joined : ISODate("2011-03-02"),
  9. likes : "racquetball"
  10. }
  • The $group operator collects all documents with the samevalue for the likes field and counts each grouping. With thisinformation, $group creates a new document with twofields:

    • _id, which contains the likes value.
    • number, which is a generated field. The $sumoperator increments this field by 1 for every document containingthe given likes value.
  • The $sort operator sorts these documents by thenumber field in reverse order.

  • The $limit operator only includes the first 5 resultdocuments.

The results of aggregation would resemble the following:

  1. {
  2. "_id" : "golf",
  3. "number" : 33
  4. },
  5. {
  6. "_id" : "racquetball",
  7. "number" : 31
  8. },
  9. {
  10. "_id" : "swimming",
  11. "number" : 24
  12. },
  13. {
  14. "_id" : "handball",
  15. "number" : 19
  16. },
  17. {
  18. "_id" : "tennis",
  19. "number" : 18
  20. }