ORM Configuration

How do I map a table that has no primary key?

The SQLAlchemy ORM, in order to map to a particular table, needs there to beat least one column denoted as a primary key column; multiple-column,i.e. composite, primary keys are of course entirely feasible as well. Thesecolumns do not need to be actually known to the database as primary keycolumns, though it’s a good idea that they are. It’s only necessary that the columnsbehave as a primary key does, e.g. as a unique and not nullable identifierfor a row.

Most ORMs require that objects have some kind of primary key definedbecause the object in memory must correspond to a uniquely identifiablerow in the database table; at the very least, this allows theobject can be targeted for UPDATE and DELETE statements which will affect onlythat object’s row and no other. However, the importance of the primary keygoes far beyond that. In SQLAlchemy, all ORM-mapped objects are at all timeslinked uniquely within a Sessionto their specific database row using a pattern called the identity map,a pattern that’s central to the unit of work system employed by SQLAlchemy,and is also key to the most common (and not-so-common) patterns of ORM usage.

Note

It’s important to note that we’re only talking about the SQLAlchemy ORM; anapplication which builds on Core and deals only with Table objects,select() constructs and the like, does not need any primary keyto be present on or associated with a table in any way (though again, in SQL, all tablesshould really have some kind of primary key, lest you need to actuallyupdate or delete specific rows).

In almost all cases, a table does have a so-called candidate key, which is a column or seriesof columns that uniquely identify a row. If a table truly doesn’t have this, and has actualfully duplicate rows, the table is not corresponding to first normal form and cannot be mapped. Otherwise, whatever columns comprise the best candidate key can beapplied directly to the mapper:

  1. class SomeClass(Base):
  2. __table__ = some_table_with_no_pk
  3. __mapper_args__ = {
  4. 'primary_key':[some_table_with_no_pk.c.uid, some_table_with_no_pk.c.bar]
  5. }

Better yet is when using fully declared table metadata, use the primary_key=Trueflag on those columns:

  1. class SomeClass(Base):
  2. __tablename__ = "some_table_with_no_pk"
  3.  
  4. uid = Column(Integer, primary_key=True)
  5. bar = Column(String, primary_key=True)

All tables in a relational database should have primary keys. Even a many-to-manyassociation table - the primary key would be the composite of the two associationcolumns:

  1. CREATE TABLE my_association (
  2. user_id INTEGER REFERENCES user(id),
  3. account_id INTEGER REFERENCES account(id),
  4. PRIMARY KEY (user_id, account_id)
  5. )

How do I configure a Column that is a Python reserved word or similar?

Column-based attributes can be given any name desired in the mapping. SeeNaming Columns Distinctly from Attribute Names.

How do I get a list of all columns, relationships, mapped attributes, etc. given a mapped class?

This information is all available from the Mapper object.

To get at the Mapper for a particular mapped class, call theinspect() function on it:

  1. from sqlalchemy import inspect
  2.  
  3. mapper = inspect(MyClass)

From there, all information about the class can be accessed through propertiessuch as:

I’m getting a warning or error about “Implicitly combining column X under attribute Y”

This condition refers to when a mapping contains two columns that are beingmapped under the same attribute name due to their name, but there’s no indicationthat this is intentional. A mapped class needs to have explicit names forevery attribute that is to store an independent value; when two columns have thesame name and aren’t disambiguated, they fall under the same attribute andthe effect is that the value from one column is copied into the other, basedon which column was assigned to the attribute first.

This behavior is often desirable and is allowed without warning in the casewhere the two columns are linked together via a foreign key relationshipwithin an inheritance mapping. When the warning or exception occurs, theissue can be resolved by either assigning the columns to differently-namedattributes, or if combining them together is desired, by usingcolumn_property() to make this explicit.

Given the example as follows:

  1. from sqlalchemy import Integer, Column, ForeignKey
  2. from sqlalchemy.ext.declarative import declarative_base
  3.  
  4. Base = declarative_base()
  5.  
  6. class A(Base):
  7. __tablename__ = 'a'
  8.  
  9. id = Column(Integer, primary_key=True)
  10.  
  11. class B(A):
  12. __tablename__ = 'b'
  13.  
  14. id = Column(Integer, primary_key=True)
  15. a_id = Column(Integer, ForeignKey('a.id'))

As of SQLAlchemy version 0.9.5, the above condition is detected, and willwarn that the id column of A and B is being combined underthe same-named attribute id, which above is a serious issue since it meansthat a B object’s primary key will always mirror that of its A.

A mapping which resolves this is as follows:

  1. class A(Base):
  2. __tablename__ = 'a'
  3.  
  4. id = Column(Integer, primary_key=True)
  5.  
  6. class B(A):
  7. __tablename__ = 'b'
  8.  
  9. b_id = Column('id', Integer, primary_key=True)
  10. a_id = Column(Integer, ForeignKey('a.id'))

Suppose we did want A.id and B.id to be mirrors of each other, despitethe fact that B.a_id is where A.id is related. We could combinethem together using column_property():

  1. class A(Base):
  2. __tablename__ = 'a'
  3.  
  4. id = Column(Integer, primary_key=True)
  5.  
  6. class B(A):
  7. __tablename__ = 'b'
  8.  
  9. # probably not what you want, but this is a demonstration
  10. id = column_property(Column(Integer, primary_key=True), A.id)
  11. a_id = Column(Integer, ForeignKey('a.id'))

I’m using Declarative and setting primaryjoin/secondaryjoin using an and() or or(), and I am getting an error message about foreign keys.

Are you doing this?:

  1. class MyClass(Base):
  2. # ....
  3.  
  4. foo = relationship("Dest", primaryjoin=and_("MyClass.id==Dest.foo_id", "MyClass.foo==Dest.bar"))

That’s an and() of two string expressions, which SQLAlchemy cannot apply any mapping towards. Declarative allows relationship() arguments to be specified as strings, which are converted into expression objects using eval(). But this doesn’t occur inside of an and() expression - it’s a special operation declarative applies only to the entirety of what’s passed to primaryjoin or other arguments as a string:

  1. class MyClass(Base):
  2. # ....
  3.  
  4. foo = relationship("Dest", primaryjoin="and_(MyClass.id==Dest.foo_id, MyClass.foo==Dest.bar)")

Or if the objects you need are already available, skip the strings:

  1. class MyClass(Base):
  2. # ....
  3.  
  4. foo = relationship(Dest, primaryjoin=and_(MyClass.id==Dest.foo_id, MyClass.foo==Dest.bar))

The same idea applies to all the other arguments, such as foreign_keys:

  1. # wrong !
  2. foo = relationship(Dest, foreign_keys=["Dest.foo_id", "Dest.bar_id"])
  3.  
  4. # correct !
  5. foo = relationship(Dest, foreign_keys="[Dest.foo_id, Dest.bar_id]")
  6.  
  7. # also correct !
  8. foo = relationship(Dest, foreign_keys=[Dest.foo_id, Dest.bar_id])
  9.  
  10. # if you're using columns from the class that you're inside of, just use the column objects !
  11. class MyClass(Base):
  12. foo_id = Column(...)
  13. bar_id = Column(...)
  14. # ...
  15.  
  16. foo = relationship(Dest, foreign_keys=[foo_id, bar_id])

Why is ORDER BY required with LIMIT (especially with subqueryload())?

A relational database can return rows in anyarbitrary order, when an explicit ordering is not set.While this ordering very often corresponds to the naturalorder of rows within a table, this is not the case for all databases andall queries. The consequence of this is that any query that limits rowsusing LIMIT or OFFSET should always specify an ORDER BY.Otherwise, it is not deterministic which rows will actually be returned.

When we use a SQLAlchemy method like Query.first(), we are in factapplying a LIMIT of one to the query, so without an explicit orderingit is not deterministic what row we actually get back.While we may not notice this for simple queries on databases that usuallyreturns rows in their naturalorder, it becomes much more of an issue if we also use orm.subqueryload()to load related collections, and we may not be loading the collectionsas intended.

SQLAlchemy implements orm.subqueryload() by issuing a separate query,the results of which are matched up to the results from the first query.We see two queries emitted like this:

  1. >>> session.query(User).options(subqueryload(User.addresses)).all()
  2. -- the "main" query
  3. SELECT users.id AS users_id
  4. FROM users
  5. -- the "load" query issued by subqueryload
  6. SELECT addresses.id AS addresses_id,
  7. addresses.user_id AS addresses_user_id,
  8. anon_1.users_id AS anon_1_users_id
  9. FROM (SELECT users.id AS users_id FROM users) AS anon_1
  10. JOIN addresses ON anon_1.users_id = addresses.user_id
  11. ORDER BY anon_1.users_id

The second query embeds the first query as a source of rows.When the inner query uses OFFSET and/or LIMIT without ordering,the two queries may not see the same results:

  1. >>> user = session.query(User).options(subqueryload(User.addresses)).first()
  2. -- the "main" query
  3. SELECT users.id AS users_id
  4. FROM users
  5. LIMIT 1
  6. -- the "load" query issued by subqueryload
  7. SELECT addresses.id AS addresses_id,
  8. addresses.user_id AS addresses_user_id,
  9. anon_1.users_id AS anon_1_users_id
  10. FROM (SELECT users.id AS users_id FROM users LIMIT 1) AS anon_1
  11. JOIN addresses ON anon_1.users_id = addresses.user_id
  12. ORDER BY anon_1.users_id

Depending on database specifics, there isa chance we may get a result like the following for the two queries:

  1. -- query #1
  2. +--------+
  3. |users_id|
  4. +--------+
  5. | 1|
  6. +--------+
  7.  
  8. -- query #2
  9. +------------+-----------------+---------------+
  10. |addresses_id|addresses_user_id|anon_1_users_id|
  11. +------------+-----------------+---------------+
  12. | 3| 2| 2|
  13. +------------+-----------------+---------------+
  14. | 4| 2| 2|
  15. +------------+-----------------+---------------+

Above, we receive two addresses rows for user.id of 2, and none for1. We’ve wasted two rows and failed to actually load the collection. Thisis an insidious error because without looking at the SQL and the results, theORM will not show that there’s any issue; if we access the addressesfor the User we have, it will emit a lazy load for the collection and wewon’t see that anything actually went wrong.

The solution to this problem is to always specify a deterministic sort order,so that the main query always returns the same set of rows. This generallymeans that you should Query.order_by() on a unique column on the table.The primary key is a good choice for this:

  1. session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()

Note that the joinedload() eager loader strategy does not suffer fromthe same problem because only one query is ever issued, so the load querycannot be different from the main query. Similarly, the selectinload()eager loader strategy also does not have this issue as it links its collectionloads directly to primary key values just loaded.

See also

The Importance of Ordering