Linking Relationships with Backref

The backref keyword argument was first introduced in Object Relational Tutorial, and has beenmentioned throughout many of the examples here. What does it actually do ? Let’s startwith the canonical User and Address scenario:

  1. from sqlalchemy import Integer, ForeignKey, String, Column
  2. from sqlalchemy.ext.declarative import declarative_base
  3. from sqlalchemy.orm import relationship
  4.  
  5. Base = declarative_base()
  6.  
  7. class User(Base):
  8. __tablename__ = 'user'
  9. id = Column(Integer, primary_key=True)
  10. name = Column(String)
  11.  
  12. addresses = relationship("Address", backref="user")
  13.  
  14. class Address(Base):
  15. __tablename__ = 'address'
  16. id = Column(Integer, primary_key=True)
  17. email = Column(String)
  18. user_id = Column(Integer, ForeignKey('user.id'))

The above configuration establishes a collection of Address objects on User calledUser.addresses. It also establishes a .user attribute on Address which willrefer to the parent User object.

In fact, the backref keyword is only a common shortcut for placing a secondrelationship() onto the Address mapping, including the establishmentof an event listener on both sides which will mirror attribute operationsin both directions. The above configuration is equivalent to:

  1. from sqlalchemy import Integer, ForeignKey, String, Column
  2. from sqlalchemy.ext.declarative import declarative_base
  3. from sqlalchemy.orm import relationship
  4.  
  5. Base = declarative_base()
  6.  
  7. class User(Base):
  8. __tablename__ = 'user'
  9. id = Column(Integer, primary_key=True)
  10. name = Column(String)
  11.  
  12. addresses = relationship("Address", back_populates="user")
  13.  
  14. class Address(Base):
  15. __tablename__ = 'address'
  16. id = Column(Integer, primary_key=True)
  17. email = Column(String)
  18. user_id = Column(Integer, ForeignKey('user.id'))
  19.  
  20. user = relationship("User", back_populates="addresses")

Above, we add a .user relationship to Address explicitly. Onboth relationships, the back_populates directive tells each relationshipabout the other one, indicating that they should establish “bidirectional”behavior between each other. The primary effect of this configurationis that the relationship adds event handlers to both attributeswhich have the behavior of “when an append or set event occurs here, set ourselvesonto the incoming attribute using this particular attribute name”.The behavior is illustrated as follows. Start with a User and an Addressinstance. The .addresses collection is empty, and the .user attributeis None:

  1. >>> u1 = User()
  2. >>> a1 = Address()
  3. >>> u1.addresses
  4. []
  5. >>> print(a1.user)
  6. None

However, once the Address is appended to the u1.addresses collection,both the collection and the scalar attribute have been populated:

  1. >>> u1.addresses.append(a1)
  2. >>> u1.addresses
  3. [<__main__.Address object at 0x12a6ed0>]
  4. >>> a1.user
  5. <__main__.User object at 0x12a6590>

This behavior of course works in reverse for removal operations as well, as wellas for equivalent operations on both sides. Such aswhen .user is set again to None, the Address object is removedfrom the reverse collection:

  1. >>> a1.user = None
  2. >>> u1.addresses
  3. []

The manipulation of the .addresses collection and the .user attributeoccurs entirely in Python without any interaction with the SQL database.Without this behavior, the proper state would be apparent on both sides once thedata has been flushed to the database, and later reloaded after a commit orexpiration operation occurs. The backref/back_populates behavior has the advantagethat common bidirectional operations can reflect the correct state without requiringa database round trip.

Remember, when the backref keyword is used on a single relationship, it’sexactly the same as if the above two relationships were created individuallyusing back_populates on each.

Backref Arguments

We’ve established that the backref keyword is merely a shortcut for buildingtwo individual relationship() constructs that refer to each other. Part ofthe behavior of this shortcut is that certain configurational arguments applied tothe relationship()will also be applied to the other direction - namely those arguments that describethe relationship at a schema level, and are unlikely to be different in the reversedirection. The usual casehere is a many-to-many relationship() that has a secondary argument,or a one-to-many or many-to-one which has a primaryjoin argument (theprimaryjoin argument is discussed in Specifying Alternate Join Conditions). Suchas if we limited the list of Address objects to those which start with “tony”:

  1. from sqlalchemy import Integer, ForeignKey, String, Column
  2. from sqlalchemy.ext.declarative import declarative_base
  3. from sqlalchemy.orm import relationship
  4.  
  5. Base = declarative_base()
  6.  
  7. class User(Base):
  8. __tablename__ = 'user'
  9. id = Column(Integer, primary_key=True)
  10. name = Column(String)
  11.  
  12. addresses = relationship("Address",
  13. primaryjoin="and_(User.id==Address.user_id, "
  14. "Address.email.startswith('tony'))",
  15. backref="user")
  16.  
  17. class Address(Base):
  18. __tablename__ = 'address'
  19. id = Column(Integer, primary_key=True)
  20. email = Column(String)
  21. user_id = Column(Integer, ForeignKey('user.id'))

We can observe, by inspecting the resulting property, that both sidesof the relationship have this join condition applied:

  1. >>> print(User.addresses.property.primaryjoin)
  2. "user".id = address.user_id AND address.email LIKE :email_1 || '%%'
  3. >>>
  4. >>> print(Address.user.property.primaryjoin)
  5. "user".id = address.user_id AND address.email LIKE :email_1 || '%%'
  6. >>>

This reuse of arguments should pretty much do the “right thing” - ituses only arguments that are applicable, and in the case of a many-to-many relationship, will reverse the usage ofprimaryjoin andsecondaryjoin to correspond to the otherdirection (see the example in Self-Referential Many-to-Many Relationship forthis).

It’s very often the case however that we’d like to specify argumentsthat are specific to just the side where we happened to place the“backref”. This includes relationship() arguments likelazy,remote_side,cascade andcascade_backrefs. For this case we usethe backref() function in place of a string:

  1. # <other imports>
  2. from sqlalchemy.orm import backref
  3.  
  4. class User(Base):
  5. __tablename__ = 'user'
  6. id = Column(Integer, primary_key=True)
  7. name = Column(String)
  8.  
  9. addresses = relationship("Address",
  10. backref=backref("user", lazy="joined"))

Where above, we placed a lazy="joined" directive only on the Address.userside, indicating that when a query against Address is made, a join to the Userentity should be made automatically which will populate the .user attribute of eachreturned Address. The backref() function formatted the arguments we gaveit into a form that is interpreted by the receiving relationship() as additionalarguments to be applied to the new relationship it creates.

One Way Backrefs

An unusual case is that of the “one way backref”. This is where the“back-populating” behavior of the backref is only desirable in onedirection. An example of this is a collection which contains afiltering primaryjoin condition. We’dlike to append items to this collection as needed, and have thempopulate the “parent” object on the incoming object. However, we’dalso like to have items that are not part of the collection, but stillhave the same “parent” association - these items should never be inthe collection.

Taking our previous example, where we established aprimaryjoin that limited the collectiononly to Address objects whose email address started with the wordtony, the usual backref behavior is that all items populate inboth directions. We wouldn’t want this behavior for a case like thefollowing:

  1. >>> u1 = User()
  2. >>> a1 = Address(email='mary')
  3. >>> a1.user = u1
  4. >>> u1.addresses
  5. [<__main__.Address object at 0x1411910>]

Above, the Address object that doesn’t match the criterion of “starts with ‘tony’”is present in the addresses collection of u1. After these objects are flushed,the transaction committed and their attributes expired for a re-load, the addressescollection will hit the database on next access and no longer have this Address objectpresent, due to the filtering condition. But we can do away with this unwanted sideof the “backref” behavior on the Python side by using two separate relationship() constructs,placing back_populates only on one side:

  1. from sqlalchemy import Integer, ForeignKey, String, Column
  2. from sqlalchemy.ext.declarative import declarative_base
  3. from sqlalchemy.orm import relationship
  4.  
  5. Base = declarative_base()
  6.  
  7. class User(Base):
  8. __tablename__ = 'user'
  9. id = Column(Integer, primary_key=True)
  10. name = Column(String)
  11. addresses = relationship("Address",
  12. primaryjoin="and_(User.id==Address.user_id, "
  13. "Address.email.startswith('tony'))",
  14. back_populates="user")
  15.  
  16. class Address(Base):
  17. __tablename__ = 'address'
  18. id = Column(Integer, primary_key=True)
  19. email = Column(String)
  20. user_id = Column(Integer, ForeignKey('user.id'))
  21. user = relationship("User")

With the above scenario, appending an Address object to the .addressescollection of a User will always establish the .user attribute on thatAddress:

  1. >>> u1 = User()
  2. >>> a1 = Address(email='tony')
  3. >>> u1.addresses.append(a1)
  4. >>> a1.user
  5. <__main__.User object at 0x1411850>

However, applying a User to the .user attribute of an Address,will not append the Address object to the collection:

  1. >>> a2 = Address(email='mary')
  2. >>> a2.user = u1
  3. >>> a2 in u1.addresses
  4. False

Of course, we’ve disabled some of the usefulness ofbackref here, in that when we do append anAddress that corresponds to the criteria ofemail.startswith('tony'), it won’t show up in theUser.addresses collection until the session is flushed, and theattributes reloaded after a commit or expire operation. While wecould consider an attribute event that checks this criterion inPython, this starts to cross the line of duplicating too much SQLbehavior in Python. The backref behavior itself is only a slighttransgression of this philosophy - SQLAlchemy tries to keep these to aminimum overall.