Foreign Keys and Joins

Foreign keys are created using a special field class ForeignKeyField. Each foreign key also creates a back-reference on the related model using the specified backref.

Note

In SQLite, foreign keys are not enabled by default. Most things, including the Peewee foreign-key API, will work fine, but ON DELETE behaviour will be ignored, even if you explicitly specify on_delete to your ForeignKeyField. In conjunction with the default PrimaryKeyField behaviour (where deleted record IDs can be reused), this can lead to surprising (and almost certainly unwanted) behaviour where if you delete a record in table A referenced by a foreign key in table B, and then create a new, unrelated, record in table A, the new record will end up mis-attached to the undeleted record in table B. To avoid the mis-attachment, you can use AutoIncrementField, but it may be better overall to ensure that foreign keys are enabled with pragmas=(('foreign_keys', 'on'),) when you instantiate SqliteDatabase.

Traversing foreign keys

Referring back to the User and Tweet models, note that there is a ForeignKeyField from Tweet to User. The foreign key can be traversed, allowing you access to the associated user instance:

  1. >>> tweet.user.username
  2. 'charlie'

Note

Unless the User model was explicitly selected when retrieving the Tweet, an additional query will be required to load the User data. To learn how to avoid the extra query, see the N+1 query documentation.

The reverse is also true, and we can iterate over the tweets associated with a given User instance:

  1. >>> for tweet in user.tweets:
  2. ... print(tweet.message)
  3. ...
  4. http://www.youtube.com/watch?v=xdhLQCYQ-nQ

Under the hood, the tweets attribute is just a Select with the WHERE clause pre-populated to point to the given User instance:

  1. >>> user.tweets
  2. <peewee.ModelSelect at 0x7f73db3bafd0>
  3. >>> user.tweets.sql()
  4. ('SELECT "t1"."id", "t1"."user_id", "t1"."content", "t1"."timestamp" FROM "tweet" AS "t1" WHERE ("t1"."user_id" = ?)',
  5. [1])

Joining tables

Use the join() method to JOIN additional tables. When a foreign key exists between the source model and the join model, you do not need to specify any additional parameters:

  1. >>> my_tweets = Tweet.select().join(User).where(User.username == 'charlie')

By default peewee will use an INNER join, but you can use LEFT OUTER, RIGHT OUTER, FULL, or CROSS joins as well:

  1. users = (User
  2. .select(User, fn.Count(Tweet.id).alias('num_tweets'))
  3. .join(Tweet, JOIN.LEFT_OUTER)
  4. .group_by(User)
  5. .order_by(fn.Count(Tweet.id).desc()))
  6. for user in users:
  7. print(user.username, 'has created', user.num_tweets, 'tweet(s).')

Selecting from multiple models

SQL makes it easy to select columns from multiple tables and return it all at once. Peewee makes this possible, too, but since Peewee models form a graph (via foreign-keys), the selected data is returned as a graph of model instances. To see what I mean, consider this query:

  1. SELECT tweet.content, tweet.timestamp, user.username
  2. FROM tweet
  3. INNER JOIN user ON tweet.user_id = user.id
  4. ORDER BY tweet.timestamp DESC;
  5. -- Returns rows like
  6. -- "Meow I'm a tweet" | 2017-01-17 13:37:00 | huey
  7. -- "Woof woof" | 2017-01-17 11:59:00 | mickey
  8. -- "Purr" | 2017-01-17 10:00:00 | huey

With Peewee we would write this query:

  1. query = (Tweet
  2. .select(Tweet.content, Tweet.timestamp, User.username)
  3. .join(User)
  4. .order_by(Tweet.timestamp.desc()))

The question is: where is the “username” attribute to be found? The answer is that Peewee, because there is a foreign-key relationship between Tweet and User, will return each row as a Tweet model with the associated User model, which has it’s username attribute set:

  1. for tweet in query:
  2. print(tweet.content, tweet.timestamp, tweet.user.username)

When doing complicated joins, joins where no foreign-key exists (for example joining on a subquery), etc., it is necessary to tell Peewee where to place the joined attributes. This is done by putting an alias on the join predicate expression.

For example, let’s say that in the above query we want to put the joined user data in the Tweet.foo attribute:

  1. query = (Tweet
  2. .select(Tweet.content, Tweet.timestamp, User.username)
  3. .join(User, on=(Tweet.user == User.id).alias('foo'))
  4. .order_by(Tweet.timestamp.desc()))
  5. for tweet in query:
  6. # Joined user data is stored in "tweet.foo":
  7. print(tweet.content, tweet.timestamp, tweet.foo.username)

For queries with complex joins and selections from several models, constructing this graph can be expensive. If you wish, instead, to have all columns as attributes on a single model, you can use objects() method:

  1. for tweet in query.objects():
  2. # Now "username" is on the Tweet model itself:
  3. print(tweet.content, tweet.timestamp, tweet.username)

For additional performance gains, consider using dicts(), tuples() or namedtuples() when iterating large and/or complex result-sets.

Multiple Foreign Keys to the Same Model

When there are multiple foreign keys to the same model, it is good practice to explicitly specify which field you are joining on.

Referring back to the example app’s models, consider the Relationship model, which is used to denote when one user follows another. Here is the model definition:

  1. class Relationship(BaseModel):
  2. from_user = ForeignKeyField(User, backref='relationships')
  3. to_user = ForeignKeyField(User, backref='related_to')
  4. class Meta:
  5. indexes = (
  6. # Specify a unique multi-column index on from/to-user.
  7. (('from_user', 'to_user'), True),
  8. )

Since there are two foreign keys to User, we should always specify which field we are using in a join.

For example, to determine which users I am following, I would write:

  1. (User
  2. .select()
  3. .join(Relationship, on=Relationship.to_user)
  4. .where(Relationship.from_user == charlie))

On the other hand, if I wanted to determine which users are following me, I would instead join on the from_user column and filter on the relationship’s to_user:

  1. (User
  2. .select()
  3. .join(Relationship, on=Relationship.from_user)
  4. .where(Relationship.to_user == charlie))

Joining on arbitrary fields

If a foreign key does not exist between two tables you can still perform a join, but you must manually specify the join predicate.

In the following example, there is no explicit foreign-key between User and ActivityLog, but there is an implied relationship between the ActivityLog.object_id field and User.id. Rather than joining on a specific Field, we will join using an Expression.

  1. user_log = (User
  2. .select(User, ActivityLog)
  3. .join(
  4. ActivityLog,
  5. on=(User.id == ActivityLog.object_id).alias('log'))
  6. .where(
  7. (ActivityLog.activity_type == 'user_activity') &
  8. (User.username == 'charlie')))
  9. for user in user_log:
  10. print(user.username, user.log.description)
  11. #### Print something like ####
  12. charlie logged in
  13. charlie posted a tweet
  14. charlie retweeted
  15. charlie posted a tweet
  16. charlie logged out

Note

By specifying an alias on the join condition, you can control the attribute peewee will assign the joined instance to. In the previous example, we used the following join:

  1. (User.id == ActivityLog.object_id).alias('log')

Then when iterating over the query, we were able to directly access the joined ActivityLog without incurring an additional query:

  1. for user in user_log:
  2. print(user.username, user.log.description)

Joining on Multiple Tables

When calling join(), peewee will use the last joined table as the source table. For example:

  1. User.select().join(Tweet).join(Comment)

This query will result in a join from User to Tweet, and another join from Tweet to Comment.

If you would like to join the same table twice, use the switch() method:

  1. # Join the Artist table on both `Album` and `Genre`.
  2. Artist.select().join(Album).switch(Artist).join(Genre)

Alternatively, you can use join_from():

  1. query = (Artist
  2. .select()
  3. .join(Album)
  4. .join_from(Artist, Genre))