Querying - Selecting multiple records - 《Peewee 3.14.4 Documentation》

Selecting multiple records
- Iterating over large result-sets

Selecting multiple records

We can use Model.select() to retrieve rows from the table. When you construct a SELECT query, the database will return any rows that correspond to your query. Peewee allows you to iterate over these rows, as well as use indexing and slicing operations:

>>> query = User.select()
>>> [user.username for user in query]
['Charlie', 'Huey', 'Peewee']
>>> query[1]
<__main__.User at 0x7f83e80f5550>
>>> query[1].username
'Huey'
>>> query[:2]
[<__main__.User at 0x7f83e80f53a8>, <__main__.User at 0x7f83e80f5550>]

Select queries are smart, in that you can iterate, index and slice the query multiple times but the query is only executed once.

In the following example, we will simply call select() and iterate over the return value, which is an instance of Select. This will return all the rows in the User table:

>>> for user in User.select():
...     print(user.username)
...
Charlie
Huey
Peewee

Note

Subsequent iterations of the same query will not hit the database as the results are cached. To disable this behavior (to reduce memory usage), call Select.iterator() when iterating.

When iterating over a model that contains a foreign key, be careful with the way you access values on related models. Accidentally resolving a foreign key or iterating over a back-reference can cause N+1 query behavior.

When you create a foreign key, such as Tweet.user, you can use the backref to create a back-reference (User.tweets). Back-references are exposed as Select instances:

>>> tweet = Tweet.get()
>>> tweet.user  # Accessing a foreign key returns the related model.
<tw.User at 0x7f3ceb017f50>
>>> user = User.get()
>>> user.tweets  # Accessing a back-reference returns a query.
<peewee.ModelSelect at 0x7f73db3bafd0>

You can iterate over the user.tweets back-reference just like any other Select:

>>> for tweet in user.tweets:
...     print(tweet.message)
...
hello world
this is fun
look at this picture of my food

In addition to returning model instances, Select queries can return dictionaries, tuples and namedtuples. Depending on your use-case, you may find it easier to work with rows as dictionaries, for example:

>>> query = User.select().dicts()
>>> for row in query:
...     print(row)
{'id': 1, 'username': 'Charlie'}
{'id': 2, 'username': 'Huey'}
{'id': 3, 'username': 'Peewee'}

See namedtuples(), tuples(), dicts() for more information.

Iterating over large result-sets

By default peewee will cache the rows returned when iterating over a Select query. This is an optimization to allow multiple iterations as well as indexing and slicing without causing additional queries. This caching can be problematic, however, when you plan to iterate over a large number of rows.

To reduce the amount of memory used by peewee when iterating over a query, use the iterator() method. This method allows you to iterate without caching each model returned, using much less memory when iterating over large result sets.

# Let's assume we've got 10 million stat objects to dump to a csv file.
stats = Stat.select()
# Our imaginary serializer class
serializer = CSVSerializer()
# Loop over all the stats and serialize.
for stat in stats.iterator():
    serializer.serialize_object(stat)

For simple queries you can see further speed improvements by returning rows as dictionaries, namedtuples or tuples. The following methods can be used on any Select query to change the result row type:

Don’t forget to append the iterator() method call to also reduce memory consumption. For example, the above code might look like:

# Let's assume we've got 10 million stat objects to dump to a csv file.
stats = Stat.select()
# Our imaginary serializer class
serializer = CSVSerializer()
# Loop over all the stats (rendered as tuples, without caching) and serialize.
for stat_tuple in stats.tuples().iterator():
    serializer.serialize_tuple(stat_tuple)

When iterating over a large number of rows that contain columns from multiple tables, peewee will reconstruct the model graph for each row returned. This operation can be slow for complex graphs. For example, if we were selecting a list of tweets along with the username and avatar of the tweet’s author, Peewee would have to create two objects for each row (a tweet and a user). In addition to the above row-types, there is a fourth method objects() which will return the rows as model instances, but will not attempt to resolve the model graph.

For example:

query = (Tweet
         .select(Tweet, User)  # Select tweet and user data.
         .join(User))
# Note that the user columns are stored in a separate User instance
# accessible at tweet.user:
for tweet in query:
    print(tweet.user.username, tweet.content)
# Using ".objects()" will not create the tweet.user object and assigns all
# user attributes to the tweet instance:
for tweet in query.objects():
    print(tweet.username, tweet.content)

For maximum performance, you can execute queries and then iterate over the results using the underlying database cursor. Database.execute() accepts a query object, executes the query, and returns a DB-API 2.0 Cursor object. The cursor will return the raw row-tuples:

query = Tweet.select(Tweet.content, User.username).join(User)
cursor = database.execute(query)
for (content, username) in cursor:
    print(username, '->', content)