Avoiding the N+1 problem

The N+1 problem refers to a situation where an application performs a query,then for each row of the result set, the application performs at least oneother query (another way to conceptualize this is as a nested loop). In manycases, these n queries can be avoided through the use of a SQL join orsubquery. The database itself may do a nested loop, but it will usually be moreperformant than doing n queries in your application code, which involveslatency communicating with the database and may not take advantage of indicesor other optimizations employed by the database when joining or executing asubquery.

Peewee provides several APIs for mitigating N+1 query behavior. Recollectingthe models used throughout this document, User and Tweet, this section willtry to outline some common N+1 scenarios, and how peewee can help you avoidthem.

Attention

In some cases, N+1 queries will not result in a significant or measurableperformance hit. It all depends on the data you are querying, the databaseyou are using, and the latency involved in executing queries and retrievingresults. As always when making optimizations, profile before and after toensure the changes do what you expect them to.

List recent tweets

The twitter timeline displays a list of tweets from multiple users. In additionto the tweet’s content, the username of the tweet’s author is also displayed.The N+1 scenario here would be:

  • Fetch the 10 most recent tweets.
  • For each tweet, select the author (10 queries).By selecting both tables and using a join, peewee makes it possible toaccomplish this in a single query:
  1. query = (Tweet
  2. .select(Tweet, User) # Note that we are selecting both models.
  3. .join(User) # Use an INNER join because every tweet has an author.
  4. .order_by(Tweet.id.desc()) # Get the most recent tweets.
  5. .limit(10))
  6.  
  7. for tweet in query:
  8. print(tweet.user.username, '-', tweet.message)

Without the join, accessing tweet.user.username would trigger a query toresolve the foreign key tweet.user and retrieve the associated user. Butsince we have selected and joined on User, peewee will automaticallyresolve the foreign-key for us.

Note

This technique is discussed in more detail in Selecting from multiple sources.

List users and all their tweets

Let’s say you want to build a page that shows several users and all of theirtweets. The N+1 scenario would be:

  • Fetch some users.
  • For each user, fetch their tweets.This situation is similar to the previous example, but there is one importantdifference: when we selected tweets, they only have a single associated user,so we could directly assign the foreign key. The reverse is not true, however,as one user may have any number of tweets (or none at all).

Peewee provides an approach to avoiding O(n) queries in this situation. Fetchusers first, then fetch all the tweets associated with those users. Oncepeewee has the big list of tweets, it will assign them out, matching them withthe appropriate user. This method is usually faster but will involve a queryfor each table being selected.

Using prefetch

peewee supports pre-fetching related data using sub-queries. This methodrequires the use of a special API, prefetch(). Prefetch, as its nameimplies, will eagerly load the appropriate tweets for the given users usingsubqueries. This means instead of O(n) queries for n rows, we will doO(k) queries for k tables.

Here is an example of how we might fetch several users and any tweets theycreated within the past week.

  1. week_ago = datetime.date.today() - datetime.timedelta(days=7)
  2. users = User.select()
  3. tweets = (Tweet
  4. .select()
  5. .where(Tweet.timestamp >= week_ago))
  6.  
  7. # This will perform two queries.
  8. users_with_tweets = prefetch(users, tweets)
  9.  
  10. for user in users_with_tweets:
  11. print(user.username)
  12. for tweet in user.tweets:
  13. print(' ', tweet.message)

Note

Note that neither the User query, nor the Tweet query contained aJOIN clause. When using prefetch() you do not need to specify thejoin.

prefetch() can be used to query an arbitrary number of tables. Checkthe API documentation for more examples.

Some things to consider when using prefetch():

  • Foreign keys must exist between the models being prefetched.
  • LIMIT works as you’d expect on the outer-most query, but may be difficultto implement correctly if trying to limit the size of the sub-selects.