Adjacency List Relationships

The adjacency list pattern is a common relational pattern whereby a tablecontains a foreign key reference to itself. This is the most commonway to represent hierarchical data in flat tables. Other methodsinclude nested sets, sometimes called “modified preorder”,as well as materialized path. Despite the appeal that modified preorderhas when evaluated for its fluency within SQL queries, the adjacency list model isprobably the most appropriate pattern for the large majority of hierarchicalstorage needs, for reasons of concurrency, reduced complexity, and thatmodified preorder has little advantage over an application which can fullyload subtrees into the application space.

In this example, we’ll work with a single mappedclass called Node, representing a tree structure:

  1. class Node(Base):
  2. __tablename__ = 'node'
  3. id = Column(Integer, primary_key=True)
  4. parent_id = Column(Integer, ForeignKey('node.id'))
  5. data = Column(String(50))
  6. children = relationship("Node")

With this structure, a graph such as the following:

  1. root --+---> child1
  2. +---> child2 --+--> subchild1
  3. | +--> subchild2
  4. +---> child3

Would be represented with data such as:

  1. id parent_id data
  2. --- ------- ----
  3. 1 NULL root
  4. 2 1 child1
  5. 3 1 child2
  6. 4 3 subchild1
  7. 5 3 subchild2
  8. 6 1 child3

The relationship() configuration here works in thesame way as a “normal” one-to-many relationship, with theexception that the “direction”, i.e. whether the relationshipis one-to-many or many-to-one, is assumed by default tobe one-to-many. To establish the relationship as many-to-one,an extra directive is added known as remote_side, whichis a Column or collection of Column objectsthat indicate those which should be considered to be “remote”:

  1. class Node(Base):
  2. __tablename__ = 'node'
  3. id = Column(Integer, primary_key=True)
  4. parent_id = Column(Integer, ForeignKey('node.id'))
  5. data = Column(String(50))
  6. parent = relationship("Node", remote_side=[id])

Where above, the id column is applied as the remote_sideof the parent relationship(), thus establishingparent_id as the “local” side, and the relationshipthen behaves as a many-to-one.

As always, both directions can be combined into a bidirectionalrelationship using the backref() function:

  1. class Node(Base):
  2. __tablename__ = 'node'
  3. id = Column(Integer, primary_key=True)
  4. parent_id = Column(Integer, ForeignKey('node.id'))
  5. data = Column(String(50))
  6. children = relationship("Node",
  7. backref=backref('parent', remote_side=[id])
  8. )

There are several examples included with SQLAlchemy illustratingself-referential strategies; these include Adjacency List andXML Persistence.

Composite Adjacency Lists

A sub-category of the adjacency list relationship is the rarecase where a particular column is present on both the “local” and“remote” side of the join condition. An example is the Folderclass below; using a composite primary key, the account_idcolumn refers to itself, to indicate sub folders which are withinthe same account as that of the parent; while folder_id refersto a specific folder within that account:

  1. class Folder(Base):
  2. __tablename__ = 'folder'
  3. __table_args__ = (
  4. ForeignKeyConstraint(
  5. ['account_id', 'parent_id'],
  6. ['folder.account_id', 'folder.folder_id']),
  7. )
  8.  
  9. account_id = Column(Integer, primary_key=True)
  10. folder_id = Column(Integer, primary_key=True)
  11. parent_id = Column(Integer)
  12. name = Column(String)
  13.  
  14. parent_folder = relationship("Folder",
  15. backref="child_folders",
  16. remote_side=[account_id, folder_id]
  17. )

Above, we pass account_id into the remote_side list.relationship() recognizes that the account_id column hereis on both sides, and aligns the “remote” column along with thefolder_id column, which it recognizes as uniquely present onthe “remote” side.

Self-Referential Query Strategies

Querying of self-referential structures works like any other query:

  1. # get all nodes named 'child2'
  2. session.query(Node).filter(Node.data=='child2')

However extra care is needed when attempting to join alongthe foreign key from one level of the tree to the next. In SQL,a join from a table to itself requires that at least one side of theexpression be “aliased” so that it can be unambiguously referred to.

Recall from Using Aliases in the ORM tutorial that theorm.aliased() construct is normally used to provide an “alias” ofan ORM entity. Joining from Node to itself using this techniquelooks like:

  1. from sqlalchemy.orm import aliased
  2.  
  3. nodealias = aliased(Node)
  4. sqlsession.query(Node).filter(Node.data=='subchild1').\
  5. join(nodealias, Node.parent).\
  6. filter(nodealias.data=="child2").\
  7. all()
  8. SELECT node.id AS node_id,
  9. node.parent_id AS node_parent_id,
  10. node.data AS node_data
  11. FROM node JOIN node AS node_1
  12. ON node.parent_id = node_1.id
  13. WHERE node.data = ?
  14. AND node_1.data = ?
  15. ['subchild1', 'child2']

Query.join() also includes a feature known asQuery.join.aliased that can shorten the verbosity self-referential joins, at the expense of query flexibility. This featureperforms a similar “aliasing” step to that above, without the need foran explicit entity. Calls to Query.filter() and similarsubsequent to the aliased join will adapt the Node entity tobe that of the alias:

  1. sqlsession.query(Node).filter(Node.data=='subchild1').\
  2. join(Node.parent, aliased=True).\
  3. filter(Node.data=='child2').\
  4. all()
  5. SELECT node.id AS node_id,
  6. node.parent_id AS node_parent_id,
  7. node.data AS node_data
  8. FROM node
  9. JOIN node AS node_1 ON node_1.id = node.parent_id
  10. WHERE node.data = ? AND node_1.data = ?
  11. ['subchild1', 'child2']

To add criterion to multiple points along a longer join, addQuery.join.from_joinpoint to the additionaljoin() calls:

  1. # get all nodes named 'subchild1' with a
  2. # parent named 'child2' and a grandparent 'root'
  3. sqlsession.query(Node).\
  4. filter(Node.data=='subchild1').\
  5. join(Node.parent, aliased=True).\
  6. filter(Node.data=='child2').\
  7. join(Node.parent, aliased=True, from_joinpoint=True).\
  8. filter(Node.data=='root').\
  9. all()
  10. SELECT node.id AS node_id,
  11. node.parent_id AS node_parent_id,
  12. node.data AS node_data
  13. FROM node
  14. JOIN node AS node_1 ON node_1.id = node.parent_id
  15. JOIN node AS node_2 ON node_2.id = node_1.parent_id
  16. WHERE node.data = ?
  17. AND node_1.data = ?
  18. AND node_2.data = ?
  19. ['subchild1', 'child2', 'root']

Query.reset_joinpoint() will also remove the “aliasing” from filteringcalls:

  1. session.query(Node).\
  2. join(Node.children, aliased=True).\
  3. filter(Node.data == 'foo').\
  4. reset_joinpoint().\
  5. filter(Node.data == 'bar')

For an example of using Query.join.aliased toarbitrarily join along a chain of self-referential nodes, seeXML Persistence.

Configuring Self-Referential Eager Loading

Eager loading of relationships occurs using joins or outerjoins from parent tochild table during a normal query operation, such that the parent and itsimmediate child collection or reference can be populated from a single SQLstatement, or a second statement for all immediate child collections.SQLAlchemy’s joined and subquery eager loading use aliased tables in all caseswhen joining to related items, so are compatible with self-referentialjoining. However, to use eager loading with a self-referential relationship,SQLAlchemy needs to be told how many levels deep it should join and/or query;otherwise the eager load will not take place at all. This depth setting isconfigured via join_depth:

  1. class Node(Base):
  2. __tablename__ = 'node'
  3. id = Column(Integer, primary_key=True)
  4. parent_id = Column(Integer, ForeignKey('node.id'))
  5. data = Column(String(50))
  6. children = relationship("Node",
  7. lazy="joined",
  8. join_depth=2)
  9.  
  10. sqlsession.query(Node).all()
  11. SELECT node_1.id AS node_1_id,
  12. node_1.parent_id AS node_1_parent_id,
  13. node_1.data AS node_1_data,
  14. node_2.id AS node_2_id,
  15. node_2.parent_id AS node_2_parent_id,
  16. node_2.data AS node_2_data,
  17. node.id AS node_id,
  18. node.parent_id AS node_parent_id,
  19. node.data AS node_data
  20. FROM node
  21. LEFT OUTER JOIN node AS node_2
  22. ON node.id = node_2.parent_id
  23. LEFT OUTER JOIN node AS node_1
  24. ON node_2.id = node_1.parent_id
  25. []