Joining multiple tables

Let’s take another look at joins by querying the list of users and getting the count of how many tweet’s they’ve authored that were favorited. This will require us to join twice: from user to tweet, and from tweet to favorite. We’ll add the additional requirement that users should be included who have not created any tweets, as well as users whose tweets have not been favorited. The query, expressed in SQL, would be:

  1. SELECT user.username, COUNT(favorite.id)
  2. FROM user
  3. LEFT OUTER JOIN tweet ON tweet.user_id = user.id
  4. LEFT OUTER JOIN favorite ON favorite.tweet_id = tweet.id
  5. GROUP BY user.username

Note

In the above query both joins are LEFT OUTER, since a user may not have any tweets or, if they have tweets, none of them may have been favorited.

Peewee has a concept of a join context, meaning that whenever we call the join() method, we are implicitly joining on the previously-joined model (or if this is the first call, the model we are selecting from). Since we are joining straight through, from user to tweet, then from tweet to favorite, we can simply write:

  1. query = (User
  2. .select(User.username, fn.COUNT(Favorite.id).alias('count'))
  3. .join(Tweet, JOIN.LEFT_OUTER) # Joins user -> tweet.
  4. .join(Favorite, JOIN.LEFT_OUTER) # Joins tweet -> favorite.
  5. .group_by(User.username))

Iterating over the results:

  1. >>> for user in query:
  2. ... print(user.username, user.count)
  3. ...
  4. huey 3
  5. mickey 1
  6. zaizee 0

For a more complicated example involving multiple joins and switching join contexts, let’s find all the tweets by Huey and the number of times they’ve been favorited. To do this we’ll need to perform two joins and we’ll also use an aggregate function to calculate the favorite count.

Here is how we would write this query in SQL:

  1. SELECT tweet.content, COUNT(favorite.id)
  2. FROM tweet
  3. INNER JOIN user ON tweet.user_id = user.id
  4. LEFT OUTER JOIN favorite ON favorite.tweet_id = tweet.id
  5. WHERE user.username = 'huey'
  6. GROUP BY tweet.content;

Note

We use a LEFT OUTER join from tweet to favorite since a tweet may not have any favorites, yet we still wish to display it’s content (along with a count of zero) in the result set.

With Peewee, the resulting Python code looks very similar to what we would write in SQL:

  1. query = (Tweet
  2. .select(Tweet.content, fn.COUNT(Favorite.id).alias('count'))
  3. .join(User) # Join from tweet -> user.
  4. .switch(Tweet) # Move "join context" back to tweet.
  5. .join(Favorite, JOIN.LEFT_OUTER) # Join from tweet -> favorite.
  6. .where(User.username == 'huey')
  7. .group_by(Tweet.content))

Note the call to switch() - that instructs Peewee to set the join context back to Tweet. If we had omitted the explicit call to switch, Peewee would have used User (the last model we joined) as the join context and constructed the join from User to Favorite using the Favorite.user foreign-key, which would have given us incorrect results.

If we wanted to omit the join-context switching we could instead use the join_from() method. The following query is equivalent to the previous one:

  1. query = (Tweet
  2. .select(Tweet.content, fn.COUNT(Favorite.id).alias('count'))
  3. .join_from(Tweet, User) # Join tweet -> user.
  4. .join_from(Tweet, Favorite, JOIN.LEFT_OUTER) # Join tweet -> favorite.
  5. .where(User.username == 'huey')
  6. .group_by(Tweet.content))

We can iterate over the results of the above query to print the tweet’s content and the favorite count:

  1. >>> for tweet in query:
  2. ... print('%s favorited %d times' % (tweet.content, tweet.count))
  3. ...
  4. meow favorited 1 times
  5. hiss favorited 0 times
  6. purr favorited 2 times