Top object per group

These examples describe several ways to query the single top item per group. For a thorough discuss of various techniques, check out my blog post Querying the top item by group with Peewee ORM. If you are interested in the more general problem of querying the top N items, see the section below Top N objects per group.

In these examples we will use the User and Tweet models to find each user and their most-recent tweet.

The most efficient method I found in my testing uses the MAX() aggregate function.

We will perform the aggregation in a non-correlated subquery, so we can be confident this method will be performant. The idea is that we will select the posts, grouped by their author, whose timestamp is equal to the max observed timestamp for that user.

  1. # When referencing a table multiple times, we'll call Model.alias() to create
  2. # a secondary reference to the table.
  3. TweetAlias = Tweet.alias()
  4.  
  5. # Create a subquery that will calculate the maximum Tweet created_date for each
  6. # user.
  7. subquery = (TweetAlias
  8. .select(
  9. TweetAlias.user,
  10. fn.MAX(TweetAlias.created_date).alias('max_ts'))
  11. .group_by(TweetAlias.user)
  12. .alias('tweet_max_subquery'))
  13.  
  14. # Query for tweets and join using the subquery to match the tweet's user
  15. # and created_date.
  16. query = (Tweet
  17. .select(Tweet, User)
  18. .join(User)
  19. .switch(Tweet)
  20. .join(subquery, on=(
  21. (Tweet.created_date == subquery.c.max_ts) &
  22. (Tweet.user == subquery.c.user_id))))

SQLite and MySQL are a bit more lax and permit grouping by a subset of the columns that are selected. This means we can do away with the subquery and express it quite concisely:

  1. query = (Tweet
  2. .select(Tweet, User)
  3. .join(User)
  4. .group_by(Tweet.user)
  5. .having(Tweet.created_date == fn.MAX(Tweet.created_date)))