DataSet

The dataset module contains a high-level API for working with databases modeled after the popular project of the same name. The aims of the dataset module are to provide:

  • A simplified API for working with relational data, along the lines of working with JSON.
  • An easy way to export relational data as JSON or CSV.
  • An easy way to import JSON or CSV data into a relational database.

A minimal data-loading script might look like this:

  1. from playhouse.dataset import DataSet
  2. db = DataSet('sqlite:///:memory:')
  3. table = db['sometable']
  4. table.insert(name='Huey', age=3)
  5. table.insert(name='Mickey', age=5, gender='male')
  6. huey = table.find_one(name='Huey')
  7. print huey
  8. # {'age': 3, 'gender': None, 'id': 1, 'name': 'Huey'}
  9. for obj in table:
  10. print obj
  11. # {'age': 3, 'gender': None, 'id': 1, 'name': 'Huey'}
  12. # {'age': 5, 'gender': 'male', 'id': 2, 'name': 'Mickey'}

You can export or import data using freeze() and thaw():

  1. # Export table content to the `users.json` file.
  2. db.freeze(table.all(), format='json', filename='users.json')
  3. # Import data from a CSV file into a new table. Columns will be automatically
  4. # created for each field in the CSV file.
  5. new_table = db['stats']
  6. new_table.thaw(format='csv', filename='monthly_stats.csv')

Getting started

DataSet objects are initialized by passing in a database URL of the format dialect://user:password@host/dbname. See the Database URL section for examples of connecting to various databases.

  1. # Create an in-memory SQLite database.
  2. db = DataSet('sqlite:///:memory:')

Storing data

To store data, we must first obtain a reference to a table. If the table does not exist, it will be created automatically:

  1. # Get a table reference, creating the table if it does not exist.
  2. table = db['users']

We can now insert() new rows into the table. If the columns do not exist, they will be created automatically:

  1. table.insert(name='Huey', age=3, color='white')
  2. table.insert(name='Mickey', age=5, gender='male')

To update existing entries in the table, pass in a dictionary containing the new values and filter conditions. The list of columns to use as filters is specified in the columns argument. If no filter columns are specified, then all rows will be updated.

  1. # Update the gender for "Huey".
  2. table.update(name='Huey', gender='male', columns=['name'])
  3. # Update all records. If the column does not exist, it will be created.
  4. table.update(favorite_orm='peewee')

Importing data

To import data from an external source, such as a JSON or CSV file, you can use the thaw() method. By default, new columns will be created for any attributes encountered. If you wish to only populate columns that are already defined on a table, you can pass in strict=True.

  1. # Load data from a JSON file containing a list of objects.
  2. table = dataset['stock_prices']
  3. table.thaw(filename='stocks.json', format='json')
  4. table.all()[:3]
  5. # Might print...
  6. [{'id': 1, 'ticker': 'GOOG', 'price': 703},
  7. {'id': 2, 'ticker': 'AAPL', 'price': 109},
  8. {'id': 3, 'ticker': 'AMZN', 'price': 300}]

Using transactions

DataSet supports nesting transactions using a simple context manager.

  1. table = db['users']
  2. with db.transaction() as txn:
  3. table.insert(name='Charlie')
  4. with db.transaction() as nested_txn:
  5. # Set Charlie's favorite ORM to Django.
  6. table.update(name='Charlie', favorite_orm='django', columns=['name'])
  7. # jk/lol
  8. nested_txn.rollback()

Inspecting the database

You can use the tables() method to list the tables in the current database:

  1. >>> print db.tables
  2. ['sometable', 'user']

And for a given table, you can print the columns:

  1. >>> table = db['user']
  2. >>> print table.columns
  3. ['id', 'age', 'name', 'gender', 'favorite_orm']

We can also find out how many rows are in a table:

  1. >>> print len(db['user'])
  2. 3

Reading data

To retrieve all rows, you can use the all() method:

  1. # Retrieve all the users.
  2. users = db['user'].all()
  3. # We can iterate over all rows without calling `.all()`
  4. for user in db['user']:
  5. print user['name']

Specific objects can be retrieved using find() and find_one().

  1. # Find all the users who like peewee.
  2. peewee_users = db['user'].find(favorite_orm='peewee')
  3. # Find Huey.
  4. huey = db['user'].find_one(name='Huey')

Exporting data

To export data, use the freeze() method, passing in the query you wish to export:

  1. peewee_users = db['user'].find(favorite_orm='peewee')
  2. db.freeze(peewee_users, format='json', filename='peewee_users.json')

API

class DataSet(url)

Parameters:url (str) – A database URL. See Database URL for examples.

The DataSet class provides a high-level API for working with relational databases.

  • tables

    Return a list of tables stored in the database. This list is computed dynamically each time it is accessed.

  • __getitem__(table_name)

    Provide a Table reference to the specified table. If the table does not exist, it will be created.

  • query(sql[, params=None[, commit=True]])

    Parameters:
    • sql (str) – A SQL query.
    • params (list) – Optional parameters for the query.
    • commit (bool) – Whether the query should be committed upon execution.
    Returns:

    A database cursor.

    Execute the provided query against the database.

  • transaction()

    Create a context manager representing a new transaction (or savepoint).

  • freeze(query[, format=’csv’[, filename=None[, file_obj=None[, \*kwargs*]]]])

    Parameters:
    • query – A SelectQuery, generated using all() or ~Table.find.
    • format – Output format. By default, csv and json are supported.
    • filename – Filename to write output to.
    • file_obj – File-like object to write output to.
    • kwargs – Arbitrary parameters for export-specific functionality.
  • thaw(table[, format=’csv’[, filename=None[, file_obj=None[, strict=False[, \*kwargs*]]]]])

    Parameters:
    • table (str) – The name of the table to load data into.
    • format – Input format. By default, csv and json are supported.
    • filename – Filename to read data from.
    • file_obj – File-like object to read data from.
    • strict (bool) – Whether to store values for columns that do not already exist on the table.
    • kwargs – Arbitrary parameters for import-specific functionality.
  • connect()

    Open a connection to the underlying database. If a connection is not opened explicitly, one will be opened the first time a query is executed.

  • close()

    Close the connection to the underlying database.

class Table(dataset, name, model_class)

Provides a high-level API for working with rows in a given table.

  • columns

    Return a list of columns in the given table.

  • model_class

    A dynamically-created Model class.

  • create_index(columns[, unique=False])

    Create an index on the given columns:

    1. # Create a unique index on the `username` column.
    2. db['users'].create_index(['username'], unique=True)
  • insert(\*data*)

    Insert the given data dictionary into the table, creating new columns as needed.

  • update(columns=None, conjunction=None, \*data*)

    Update the table using the provided data. If one or more columns are specified in the columns parameter, then those columns’ values in the data dictionary will be used to determine which rows to update.

    1. # Update all rows.
    2. db['users'].update(favorite_orm='peewee')
    3. # Only update Huey's record, setting his age to 3.
    4. db['users'].update(name='Huey', age=3, columns=['name'])
  • find(\*query*)

    Query the table for rows matching the specified equality conditions. If no query is specified, then all rows are returned.

    1. peewee_users = db['users'].find(favorite_orm='peewee')
  • find_one(\*query*)

    Return a single row matching the specified equality conditions. If no matching row is found then None will be returned.

    1. huey = db['users'].find_one(name='Huey')
  • all()

    Return all rows in the given table.

  • delete(\*query*)

    Delete all rows matching the given equality conditions. If no query is provided, then all rows will be deleted.

    1. # Adios, Django!
    2. db['users'].delete(favorite_orm='Django')
    3. # Delete all the secret messages.
    4. db['secret_messages'].delete()
  • freeze([format=’csv’[, filename=None[, file_obj=None[, \*kwargs*]]]])

    Parameters:
    • format – Output format. By default, csv and json are supported.
    • filename – Filename to write output to.
    • file_obj – File-like object to write output to.
    • kwargs – Arbitrary parameters for export-specific functionality.
  • thaw([format=’csv’[, filename=None[, file_obj=None[, strict=False[, \*kwargs*]]]]])

    Parameters:
    • format – Input format. By default, csv and json are supported.
    • filename – Filename to read data from.
    • file_obj – File-like object to read data from.
    • strict (bool) – Whether to store values for columns that do not already exist on the table.
    • kwargs – Arbitrary parameters for import-specific functionality.