Postgresql Extensions

The postgresql extensions module provides a number of “postgres-only” functions,currently:

In the future I would like to add support for more of postgresql’s features.If there is a particular feature you would like to see added, pleaseopen a Github issue.

Warning

In order to start using the features described below, you will need to use the extension PostgresqlExtDatabase class instead of PostgresqlDatabase.

The code below will assume you are using the following database and base model:

  1. from playhouse.postgres_ext import *
  2.  
  3. ext_db = PostgresqlExtDatabase('peewee_test', user='postgres')
  4.  
  5. class BaseExtModel(Model):
  6. class Meta:
  7. database = ext_db

JSON Support

peewee has basic support for Postgres’ native JSON data type, in the form ofJSONField. As of version 2.4.7, peewee also supports the Postgres9.4 binary json jsonb type, via BinaryJSONField.

Warning

Postgres supports a JSON data type natively as of 9.2 (full support in 9.3).In order to use this functionality you must be using the correct version ofPostgres with psycopg2 version 2.5 or greater.

To use BinaryJSONField, which has many performance and queryingadvantages, you must have Postgres 9.4 or later.

Note

You must be sure your database is an instance ofPostgresqlExtDatabase in order to use the JSONField.

Here is an example of how you might declare a model with a JSON field:

  1. import json
  2. import urllib2
  3. from playhouse.postgres_ext import *
  4.  
  5. db = PostgresqlExtDatabase('my_database')
  6.  
  7. class APIResponse(Model):
  8. url = CharField()
  9. response = JSONField()
  10.  
  11. class Meta:
  12. database = db
  13.  
  14. @classmethod
  15. def request(cls, url):
  16. fh = urllib2.urlopen(url)
  17. return cls.create(url=url, response=json.loads(fh.read()))
  18.  
  19. APIResponse.create_table()
  20.  
  21. # Store a JSON response.
  22. offense = APIResponse.request('http://crime-api.com/api/offense/')
  23. booking = APIResponse.request('http://crime-api.com/api/booking/')
  24.  
  25. # Query a JSON data structure using a nested key lookup:
  26. offense_responses = APIResponse.select().where(
  27. APIResponse.response['meta']['model'] == 'offense')
  28.  
  29. # Retrieve a sub-key for each APIResponse. By calling .as_json(), the
  30. # data at the sub-key will be returned as Python objects (dicts, lists,
  31. # etc) instead of serialized JSON.
  32. q = (APIResponse
  33. .select(
  34. APIResponse.data['booking']['person'].as_json().alias('person'))
  35. .where(APIResponse.data['meta']['model'] == 'booking'))
  36.  
  37. for result in q:
  38. print(result.person['name'], result.person['dob'])

The BinaryJSONField works the same and supports the same operationsas the regular JSONField, but provides several additionaloperations for testing containment. Using the binary json field, you cantest whether your JSON data contains other partial JSON structures(contains(), contains_any(),contains_all()), or whether it is a subset of alarger JSON document (contained_by()).

For more examples, see the JSONField andBinaryJSONField API documents below.

hstore support

Postgresql hstoreis an embedded key/value store. With hstore, you can store arbitrary key/valuepairs in your database alongside structured relational data.

To use hstore, you need to specify an additional parameter wheninstantiating your PostgresqlExtDatabase:

  1. # Specify "register_hstore=True":
  2. db = PostgresqlExtDatabase('my_db', register_hstore=True)

Currently the postgres_ext module supports the following operations:

  • Store and retrieve arbitrary dictionaries
  • Filter by key(s) or partial dictionary
  • Update/add one or more keys to an existing dictionary
  • Delete one or more keys from an existing dictionary
  • Select keys, values, or zip keys and values
  • Retrieve a slice of keys/values
  • Test for the existence of a key
  • Test that a key has a non-NULL value

Using hstore

To start with, you will need to import the custom database class and the hstorefunctions from playhouse.postgres_ext (see above code snippet). Then, itis as simple as adding a HStoreField to your model:

  1. class House(BaseExtModel):
  2. address = CharField()
  3. features = HStoreField()

You can now store arbitrary key/value pairs on House instances:

  1. >>> h = House.create(
  2. ... address='123 Main St',
  3. ... features={'garage': '2 cars', 'bath': '2 bath'})
  4. ...
  5. >>> h_from_db = House.get(House.id == h.id)
  6. >>> h_from_db.features
  7. {'bath': '2 bath', 'garage': '2 cars'}

You can filter by individual key, multiple keys or partial dictionary:

  1. >>> query = House.select()
  2. >>> garage = query.where(House.features.contains('garage'))
  3. >>> garage_and_bath = query.where(House.features.contains(['garage', 'bath']))
  4. >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))

Suppose you want to do an atomic update to the house:

  1. >>> new_features = House.features.update({'bath': '2.5 bath', 'sqft': '1100'})
  2. >>> query = House.update(features=new_features)
  3. >>> query.where(House.id == h.id).execute()
  4. 1
  5. >>> h = House.get(House.id == h.id)
  6. >>> h.features
  7. {'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '1100'}

Or, alternatively an atomic delete:

  1. >>> query = House.update(features=House.features.delete('bath'))
  2. >>> query.where(House.id == h.id).execute()
  3. 1
  4. >>> h = House.get(House.id == h.id)
  5. >>> h.features
  6. {'garage': '2 cars', 'sqft': '1100'}

Multiple keys can be deleted at the same time:

  1. >>> query = House.update(features=House.features.delete('garage', 'sqft'))

You can select just keys, just values, or zip the two:

  1. >>> for h in House.select(House.address, House.features.keys().alias('keys')):
  2. ... print(h.address, h.keys)
  3.  
  4. 123 Main St [u'bath', u'garage']
  5.  
  6. >>> for h in House.select(House.address, House.features.values().alias('vals')):
  7. ... print(h.address, h.vals)
  8.  
  9. 123 Main St [u'2 bath', u'2 cars']
  10.  
  11. >>> for h in House.select(House.address, House.features.items().alias('mtx')):
  12. ... print(h.address, h.mtx)
  13.  
  14. 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]

You can retrieve a slice of data, for example, all the garage data:

  1. >>> query = House.select(House.address, House.features.slice('garage').alias('garage_data'))
  2. >>> for house in query:
  3. ... print(house.address, house.garage_data)
  4.  
  5. 123 Main St {'garage': '2 cars'}

You can check for the existence of a key and filter rows accordingly:

  1. >>> has_garage = House.features.exists('garage')
  2. >>> for house in House.select(House.address, has_garage.alias('has_garage')):
  3. ... print(house.address, house.has_garage)
  4.  
  5. 123 Main St True
  6.  
  7. >>> for house in House.select().where(House.features.exists('garage')):
  8. ... print(house.address, house.features['garage']) # <-- just houses w/garage data
  9.  
  10. 123 Main St 2 cars

Interval support

Postgres supports durations through the INTERVAL data-type (docs).

  • class IntervalField([null=False[, ]])
  • Field class capable of storing Python datetime.timedelta instances.

Example:

  1. from datetime import timedelta
  2.  
  3. from playhouse.postgres_ext import *
  4.  
  5. db = PostgresqlExtDatabase('my_db')
  6.  
  7. class Event(Model):
  8. location = CharField()
  9. duration = IntervalField()
  10. start_time = DateTimeField()
  11.  
  12. class Meta:
  13. database = db
  14.  
  15. @classmethod
  16. def get_long_meetings(cls):
  17. return cls.select().where(cls.duration > timedelta(hours=1))

Server-side cursors

When psycopg2 executes a query, normally all results are fetched and returnedto the client by the backend. This can cause your application to use a lot ofmemory when making large queries. Using server-side cursors, results arereturned a little at a time (by default 2000 records). For the definitivereference, please see the psycopg2 documentation.

Note

To use server-side (or named) cursors, you must be using PostgresqlExtDatabase.

To execute a query using a server-side cursor, simply wrap your select queryusing the ServerSide() helper:

  1. large_query = PageView.select() # Build query normally.
  2.  
  3. # Iterate over large query inside a transaction.
  4. for page_view in ServerSide(large_query):
  5. # do some interesting analysis here.
  6. pass
  7.  
  8. # Server-side resources are released.

If you would like all SELECT queries to automatically use a server-sidecursor, you can specify this when creating your PostgresqlExtDatabase:

  1. from postgres_ext import PostgresqlExtDatabase
  2.  
  3. ss_db = PostgresqlExtDatabase('my_db', server_side_cursors=True)

Note

Server-side cursors live only as long as the transaction, so for this reasonpeewee will not automatically call commit() after executing a SELECTquery. If you do not commit after you are done iterating, you will notrelease the server-side resources until the connection is closed (or thetransaction is committed later). Furthermore, since peewee will by defaultcache rows returned by the cursor, you should always call .iterator()when iterating over a large query.

If you are using the ServerSide() helper, the transaction andcall to iterator() will be handled transparently.

Postgresql provides sophisticated full-text search using specialdata-types (tsvector and tsquery). Documents should be stored orconverted to the tsvector type, and search queries should be converted totsquery.

For simple cases, you can simply use the Match() function, which willautomatically perform the appropriate conversions, and requires no schemachanges:

  1. def blog_search(search_term):
  2. return Blog.select().where(
  3. (Blog.status == Blog.STATUS_PUBLISHED) &
  4. Match(Blog.content, search_term))

The Match() function will automatically convert the left-hand operandto a tsvector, and the right-hand operand to a tsquery. For betterperformance, it is recommended you create a GIN index on the column youplan to search:

  1. CREATE INDEX blog_full_text_search ON blog USING gin(to_tsvector(content));

Alternatively, you can use the TSVectorField to maintain adedicated column for storing tsvector data:

  1. class Blog(Model):
  2. content = TextField()
  3. search_content = TSVectorField()

Note

TSVectorField, will automatically be created with a GIN index.

You will need to explicitly convert the incoming text data to tsvector wheninserting or updating the search_content field:

  1. content = 'Excellent blog post about peewee ORM.'
  2. blog_entry = Blog.create(
  3. content=content,
  4. search_content=fn.to_tsvector(content))

To perform a full-text search, use TSVectorField.match():

  1. terms = 'python & (sqlite | postgres)'
  2. results = Blog.select().where(Blog.search_content.match(terms))

For more information, see the Postgres full-text search docs.

postgres_ext API notes

  • class PostgresqlExtDatabase(database[, server_side_cursors=False[, register_hstore=False[, ]]])
  • Identical to PostgresqlDatabase but required in order to support:

Parameters:

  • database (str) – Name of database to connect to.
  • server_side_cursors (bool) – Whether SELECT queries should utilizeserver-side cursors.
  • register_hstore (bool) – Register the HStore extension with the connection.

If using server_side_cursors, also be sure to wrap your queries withServerSide().

  • ServerSide(select_query)

Parameters:select_query – a SelectQuery instance.Rtype generator:

Wrap the given select query in a transaction, and call it’siterator() method to avoid caching row instances. Inorder for the server-side resources to be released, be sure to exhaust thegenerator (iterate over all the rows).

Usage:

  1. large_query = PageView.select()
  2. for page_view in ServerSide(large_query):
  3. # Do something interesting.
  4. pass
  5.  
  6. # At this point server side resources are released.
  • class ArrayField([field_class=IntegerField[, field_kwargs=None[, dimensions=1[, convert_values=False]]]])

Parameters:

  • field_class – a subclass of Field, e.g. IntegerField.
  • field_kwargs (dict) – arguments to initialize field_class.
  • dimensions (int) – dimensions of array.
  • convert_values (bool) – apply field_class value conversion to array data.

Field capable of storing arrays of the provided field_class.

Note

By default ArrayField will use a GIN index. To disable this, initializethe field with index=False.

You can store and retrieve lists (or lists-of-lists):

  1. class BlogPost(BaseModel):
  2. content = TextField()
  3. tags = ArrayField(CharField)
  4.  
  5.  
  6. post = BlogPost(content='awesome', tags=['foo', 'bar', 'baz'])

Additionally, you can use the getitem API to query values or slicesin the database:

  1. # Get the first tag on a given blog post.
  2. first_tag = (BlogPost
  3. .select(BlogPost.tags[0].alias('first_tag'))
  4. .where(BlogPost.id == 1)
  5. .dicts()
  6. .get())
  7.  
  8. # first_tag = {'first_tag': 'foo'}

Get a slice of values:

  1. # Get the first two tags.
  2. two_tags = (BlogPost
  3. .select(BlogPost.tags[:2].alias('two'))
  4. .dicts()
  5. .get())
  6. # two_tags = {'two': ['foo', 'bar']}
  • contains(*items)

Parameters:items – One or more items that must be in the given array field.

  1. # Get all blog posts that are tagged with both "python" and "django".
  2. Blog.select().where(Blog.tags.contains('python', 'django'))
  • containsany(*items_)

Parameters:items – One or more items to search for in the given array field.

Like contains(), except will match rows where thearray contains any of the given items.

  1. # Get all blog posts that are tagged with "flask" and/or "django".
  2. Blog.select().where(Blog.tags.contains_any('flask', 'django'))
  • class DateTimeTZField(*args, **kwargs)
  • A timezone-aware subclass of DateTimeField.
  • class HStoreField(*args, **kwargs)
  • A field for storing and retrieving arbitrary key/value pairs. For detailson usage, see hstore support.

Attention

To use the HStoreField you will need to be sure thehstore extension is registered with the connection. To accomplishthis, instantiate the PostgresqlExtDatabase withregister_hstore=True.

Note

By default HStoreField will use a GiST index. To disable this,initialize the field with index=False.

  • keys()
  • Returns the keys for a given row.
  1. >>> for h in House.select(House.address, House.features.keys().alias('keys')):
  2. ... print(h.address, h.keys)
  3.  
  4. 123 Main St [u'bath', u'garage']
  • values()
  • Return the values for a given row.
  1. >>> for h in House.select(House.address, House.features.values().alias('vals')):
  2. ... print(h.address, h.vals)
  3.  
  4. 123 Main St [u'2 bath', u'2 cars']
  • items()
  • Like python’s dict, return the keys and values in a list-of-lists:
  1. >>> for h in House.select(House.address, House.features.items().alias('mtx')):
  2. ... print(h.address, h.mtx)
  3.  
  4. 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
  • slice(*args)
  • Return a slice of data given a list of keys.
  1. >>> for h in House.select(House.address, House.features.slice('garage').alias('garage_data')):
  2. ... print(h.address, h.garage_data)
  3.  
  4. 123 Main St {'garage': '2 cars'}
  • exists(key)
  • Query for whether the given key exists.
  1. >>> for h in House.select(House.address, House.features.exists('garage').alias('has_garage')):
  2. ... print(h.address, h.has_garage)
  3.  
  4. 123 Main St True
  5.  
  6. >>> for h in House.select().where(House.features.exists('garage')):
  7. ... print(h.address, h.features['garage']) # <-- just houses w/garage data
  8.  
  9. 123 Main St 2 cars
  • defined(key)
  • Query for whether the given key has a value associated with it.

  • update(**data)

  • Perform an atomic update to the keys/values for a given row or rows.
  1. >>> query = House.update(features=House.features.update(
  2. ... sqft=2000,
  3. ... year_built=2012))
  4. >>> query.where(House.id == 1).execute()
  • delete(*keys)
  • Delete the provided keys for a given row or rows.

Note

We will use an UPDATE query.

  1.  
  1. >>> query = House.update(features=House.features.delete(
  2. ... 'sqft', 'year_built'))
  3. >>> query.where(House.id == 1).execute()
  • contains(value)

Parameters:value – Either a dict, a list of keys, or a single key.

Query rows for the existence of either:

  1. - a partial dictionary.
  2. - a list of keys.
  3. - a single key.
  1. >>> query = House.select()
  2. >>> has_garage = query.where(House.features.contains('garage'))
  3. >>> garage_bath = query.where(House.features.contains(['garage', 'bath']))
  4. >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
  • containsany(*keys_)

Parameters:keys – One or more keys to search for.

Query rows for the existence of any key.

  • class JSONField(dumps=None, *args, **kwargs)

Parameters:dumps – The default is to call json.dumps() or the dumps function.You can override this method to create a customized JSON wrapper.

Field class suitable for storing and querying arbitrary JSON. When usingthis on a model, set the field’s value to a Python object (either adict or a list). When you retrieve your value from the database itwill be returned as a Python data structure.

Note

You must be using Postgres 9.2 / psycopg2 2.5 or greater.

Note

If you are using Postgres 9.4, strongly consider using theBinaryJSONField instead as it offers better performance andmore powerful querying options.

Example model declaration:

  1. db = PostgresqlExtDatabase('my_db')
  2.  
  3. class APIResponse(Model):
  4. url = CharField()
  5. response = JSONField()
  6.  
  7. class Meta:
  8. database = db

Example of storing JSON data:

  1. url = 'http://foo.com/api/resource/'
  2. resp = json.loads(urllib2.urlopen(url).read())
  3. APIResponse.create(url=url, response=resp)
  4.  
  5. APIResponse.create(url='http://foo.com/baz/', response={'key': 'value'})

To query, use Python’s [] operators to specify nested key or array lookups:

  1. APIResponse.select().where(
  2. APIResponse.response['key1']['nested-key'] == 'some-value')

To illustrate the use of the [] operators, imagine we have thefollowing data stored in an APIResponse:

  1. {
  2. "foo": {
  3. "bar": ["i1", "i2", "i3"],
  4. "baz": {
  5. "huey": "mickey",
  6. "peewee": "nugget"
  7. }
  8. }
  9. }

Here are the results of a few queries:

  1. def get_data(expression):
  2. # Helper function to just retrieve the results of a
  3. # particular expression.
  4. query = (APIResponse
  5. .select(expression.alias('my_data'))
  6. .dicts()
  7. .get())
  8. return query['my_data']
  9.  
  10. # Accessing the foo -> bar subkey will return a JSON
  11. # representation of the list.
  12. get_data(APIResponse.data['foo']['bar'])
  13. # '["i1", "i2", "i3"]'
  14.  
  15. # In order to retrieve this list as a Python list,
  16. # we will call .as_json() on the expression.
  17. get_data(APIResponse.data['foo']['bar'].as_json())
  18. # ['i1', 'i2', 'i3']
  19.  
  20. # Similarly, accessing the foo -> baz subkey will
  21. # return a JSON representation of the dictionary.
  22. get_data(APIResponse.data['foo']['baz'])
  23. # '{"huey": "mickey", "peewee": "nugget"}'
  24.  
  25. # Again, calling .as_json() will return an actual
  26. # python dictionary.
  27. get_data(APIResponse.data['foo']['baz'].as_json())
  28. # {'huey': 'mickey', 'peewee': 'nugget'}
  29.  
  30. # When dealing with simple values, either way works as
  31. # you expect.
  32. get_data(APIResponse.data['foo']['bar'][0])
  33. # 'i1'
  34.  
  35. # Calling .as_json() when the result is a simple value
  36. # will return the same thing as the previous example.
  37. get_data(APIResponse.data['foo']['bar'][0].as_json())
  38. # 'i1'
  • class BinaryJSONField(dumps=None, *args, **kwargs)

Parameters:dumps – The default is to call json.dumps() or the dumps function.You can override this method to create a customized JSON wrapper.

Store and query arbitrary JSON documents. Data should be stored usingnormal Python dict and list objects, and when data is returned fromthe database, it will be returned using dict and list as well.

For examples of basic query operations, see the above code samples forJSONField. The example queries below will use the sameAPIResponse model described above.

Note

By default BinaryJSONField will use a GiST index. To disable this,initialize the field with index=False.

Note

You must be using Postgres 9.4 / psycopg2 2.5 or newer. If you are using Postgres 9.2 or 9.3, you can use the regular JSONField instead.

  • contains(other)
  • Test whether the given JSON data contains the given JSON fragment or key.

Example:

  1. search_fragment = {
  2. 'foo': {'bar': ['i2']}
  3. }
  4. query = (APIResponse
  5. .select()
  6. .where(APIResponse.data.contains(search_fragment)))
  7.  
  8. # If we're searching for a list, the list items do not need to
  9. # be ordered in a particular way:
  10. query = (APIResponse
  11. .select()
  12. .where(APIResponse.data.contains({
  13. 'foo': {'bar': ['i2', 'i1']}})))

We can pass in simple keys as well. To find APIResponses that contain the key foo at the top-level:

  1. APIResponse.select().where(APIResponse.data.contains('foo'))

We can also search sub-keys using square-brackets:

  1. APIResponse.select().where(
  2. APIResponse.data['foo']['bar'].contains(['i2', 'i1']))
  • containsany(*items_)
  • Search for the presence of one or more of the given items.
  1. APIResponse.select().where(
  2. APIResponse.data.contains_any('foo', 'baz', 'nugget'))

Like contains(), we can also search sub-keys:

  1. APIResponse.select().where(
  2. APIResponse.data['foo']['bar'].contains_any('i2', 'ix'))
  • containsall(*items_)
  • Search for the presence of all of the given items.
  1. APIResponse.select().where(
  2. APIResponse.data.contains_all('foo'))

Like contains_any(), we can also search sub-keys:

  1. APIResponse.select().where(
  2. APIResponse.data['foo']['bar'].contains_all('i1', 'i2', 'i3'))
  • containedby(_other)
  • Test whether the given JSON document is contained by (is a subset of) the given JSON document. This method is the inverse of contains().
  1. big_doc = {
  2. 'foo': {
  3. 'bar': ['i1', 'i2', 'i3'],
  4. 'baz': {
  5. 'huey': 'mickey',
  6. 'peewee': 'nugget',
  7. }
  8. },
  9. 'other_key': ['nugget', 'bear', 'kitten'],
  10. }
  11. APIResponse.select().where(
  12. APIResponse.data.contained_by(big_doc))
  • concat(data)
  • Concatentate two field data and the provided data. Note that thisoperation does not merge or do a “deep concat”.

  • haskey(_key)

  • Test whether the key exists at the top-level of the JSON object.

  • remove(*keys)

  • Remove one or more keys from the top-level of the JSON object.
  • Match(field, query)
  • Generate a full-text search expression, automatically converting theleft-hand operand to a tsvector, and the right-hand operand to atsquery.

Example:

  1. def blog_search(search_term):
  2. return Blog.select().where(
  3. (Blog.status == Blog.STATUS_PUBLISHED) &
  4. Match(Blog.content, search_term))
  • class TSVectorField
  • Field type suitable for storing tsvector data. This field willautomatically be created with a GIN index for improved searchperformance.

Note

Data stored in this field will still need to be manually converted tothe tsvector type.

Note

By default TSVectorField will use a GIN index. To disable this,initialize the field with index=False.

Example usage:

  1. class Blog(Model):
  2. content = TextField()
  3. search_content = TSVectorField()
  4.  
  5. content = 'this is a sample blog entry.'
  6. blog_entry = Blog.create(
  7. content=content,
  8. search_content=fn.to_tsvector(content)) # Note `to_tsvector()`.
  • match(query[, language=None[, plain=False]])

Parameters:

  1. - **query** (_str_) the full-text search query.
  2. - **language** (_str_) language name (optional).
  3. - **plain** (_bool_) parse search query using plain (simple) parser.Returns:

an expression representing full-text search/match.

Example:

  1. # Perform a search using the "match" method.
  2. terms = 'python & (sqlite | postgres)'
  3. results = Blog.select().where(Blog.search_content.match(terms))