State Management

Quickie Intro to Object States

It’s helpful to know the states which an instance can have within a session:

  • Transient - an instance that’s not in a session, and is not saved to thedatabase; i.e. it has no database identity. The only relationship such anobject has to the ORM is that its class has a mapper() associated withit.

  • Pending - when you add() a transientinstance, it becomes pending. It still wasn’t actually flushed to thedatabase yet, but it will be when the next flush occurs.

  • Persistent - An instance which is present in the session and has a recordin the database. You get persistent instances by either flushing so that thepending instances become persistent, or by querying the database forexisting instances (or moving persistent instances from other sessions intoyour local session).

  • Deleted - An instance which has been deleted within a flush, butthe transaction has not yet completed. Objects in this state are essentiallyin the opposite of “pending” state; when the session’s transaction is committed,the object will move to the detached state. Alternatively, whenthe session’s transaction is rolled back, a deleted object movesback to the persistent state.

Changed in version 1.1: The ‘deleted’ state is a newly added sessionobject state distinct from the ‘persistent’ state.

  • Detached - an instance which corresponds, or previously corresponded,to a record in the database, but is not currently in any session.The detached object will contain a database identity marker, howeverbecause it is not associated with a session, it is unknown whether or notthis database identity actually exists in a target database. Detachedobjects are safe to use normally, except that they have no ability toload unloaded attributes or attributes that were previously markedas “expired”.

For a deeper dive into all possible state transitions, see thesection Object Lifecycle Events which describes each transitionas well as how to programmatically track each one.

Getting the Current State of an Object

The actual state of any mapped object can be viewed at any time usingthe inspect() system:

  1. >>> from sqlalchemy import inspect
  2. >>> insp = inspect(my_object)
  3. >>> insp.persistent
  4. True

See also

InstanceState.transient

InstanceState.pending

InstanceState.persistent

InstanceState.deleted

InstanceState.detached

Session Attributes

The Session itself acts somewhat like aset-like collection. All items present may be accessed using the iteratorinterface:

  1. for obj in session:
  2. print(obj)

And presence may be tested for using regular “contains” semantics:

  1. if obj in session:
  2. print("Object is present")

The session is also keeping track of all newly created (i.e. pending) objects,all objects which have had changes since they were last loaded or saved (i.e.“dirty”), and everything that’s been marked as deleted:

  1. # pending objects recently added to the Session
  2. session.new
  3.  
  4. # persistent objects which currently have changes detected
  5. # (this collection is now created on the fly each time the property is called)
  6. session.dirty
  7.  
  8. # persistent objects that have been marked as deleted via session.delete(obj)
  9. session.deleted
  10.  
  11. # dictionary of all persistent objects, keyed on their
  12. # identity key
  13. session.identity_map

(Documentation: Session.new, Session.dirty,Session.deleted, Session.identity_map).

Session Referencing Behavior

Objects within the session are weakly referenced. Thismeans that when they are dereferenced in the outside application, they fallout of scope from within the Session as welland are subject to garbage collection by the Python interpreter. Theexceptions to this include objects which are pending, objects which are markedas deleted, or persistent objects which have pending changes on them. After afull flush, these collections are all empty, and all objects are again weaklyreferenced.

To cause objects in the Session to remain stronglyreferenced, usually a simple approach is all that’s needed. Examplesof externally managed strong-referencing behavior include loadingobjects into a local dictionary keyed to their primary key, or intolists or sets for the span of time that they need to remainreferenced. These collections can be associated with aSession, if desired, by placing them into theSession.info dictionary.

An event based approach is also feasible. A simple recipe that provides“strong referencing” behavior for all objects as they remain withinthe persistent state is as follows:

  1. from sqlalchemy import event
  2.  
  3. def strong_reference_session(session):
  4. @event.listens_for(session, "pending_to_persistent")
  5. @event.listens_for(session, "deleted_to_persistent")
  6. @event.listens_for(session, "detached_to_persistent")
  7. @event.listens_for(session, "loaded_as_persistent")
  8. def strong_ref_object(sess, instance):
  9. if 'refs' not in sess.info:
  10. sess.info['refs'] = refs = set()
  11. else:
  12. refs = sess.info['refs']
  13.  
  14. refs.add(instance)
  15.  
  16.  
  17. @event.listens_for(session, "persistent_to_detached")
  18. @event.listens_for(session, "persistent_to_deleted")
  19. @event.listens_for(session, "persistent_to_transient")
  20. def deref_object(sess, instance):
  21. sess.info['refs'].discard(instance)

Above, we intercept the SessionEvents.pending_to_persistent(),SessionEvents.detached_to_persistent(),SessionEvents.deleted_to_persistent() andSessionEvents.loaded_as_persistent() event hooks in order to interceptobjects as they enter the persistent transition, and theSessionEvents.persistent_to_detached() andSessionEvents.persistent_to_deleted() hooks to interceptobjects as they leave the persistent state.

The above function may be called for any Session in order toprovide strong-referencing behavior on a per-Session basis:

  1. from sqlalchemy.orm import Session
  2.  
  3. my_session = Session()
  4. strong_reference_session(my_session)

It may also be called for any sessionmaker:

  1. from sqlalchemy.orm import sessionmaker
  2.  
  3. maker = sessionmaker()
  4. strong_reference_session(maker)

Merging

merge() transfers state from anoutside object into a new or already existing instance within a session. Italso reconciles the incoming data against the state of thedatabase, producing a history stream which will be applied towards the nextflush, or alternatively can be made to produce a simple “transfer” ofstate without producing change history or accessing the database. Usage is as follows:

  1. merged_object = session.merge(existing_object)

When given an instance, it follows these steps:

  • It examines the primary key of the instance. If it’s present, it attemptsto locate that instance in the local identity map. If the load=Trueflag is left at its default, it also checks the database for this primarykey if not located locally.

  • If the given instance has no primary key, or if no instance can be foundwith the primary key given, a new instance is created.

  • The state of the given instance is then copied onto the located/newlycreated instance. For attributes which are present on the sourceinstance, the value is transferred to the target instance. For mappedattributes which aren’t present on the source, the attribute isexpired on the target instance, discarding its existing value.

If the load=True flag is left at its default,this copy process emits events and will load the target object’sunloaded collections for each attribute present on the source object,so that the incoming state can be reconciled against what’spresent in the database. If loadis passed as False, the incoming data is “stamped” directly withoutproducing any history.

  • The operation is cascaded to related objects and collections, asindicated by the merge cascade (see Cascades).

  • The new instance is returned.

With merge(), the given “source”instance is not modified nor is it associated with the target Session,and remains available to be merged with any number of other Sessionobjects. merge() is useful fortaking the state of any kind of object structure without regard for itsorigins or current session associations and copying its state into anew session. Here’s some examples:

  • An application which reads an object structure from a file and wishes tosave it to the database might parse the file, build up thestructure, and then usemerge() to save itto the database, ensuring that the data within the file isused to formulate the primary key of each element of thestructure. Later, when the file has changed, the sameprocess can be re-run, producing a slightly differentobject structure, which can then be merged in again,and the Session willautomatically update the database to reflect thosechanges, loading each object from the database by primary key andthen updating its state with the new state given.

  • An application is storing objects in an in-memory cache, shared bymany Session objects simultaneously. merge()is used each time an object is retrieved from the cache to createa local copy of it in each Session which requests it.The cached object remains detached; only its state is moved intocopies of itself that are local to individual Sessionobjects.

In the caching use case, it’s common to use the load=Falseflag to remove the overhead of reconciling the object’s statewith the database. There’s also a “bulk” version ofmerge() called merge_result()that was designed to work with cache-extended Queryobjects - see the section Dogpile Caching.

  • An application wants to transfer the state of a series of objectsinto a Session maintained by a worker thread or otherconcurrent system. merge() makes a copy of each objectto be placed into this new Session. At the end of the operation,the parent thread/process maintains the objects it started with,and the thread/worker can proceed with local copies of those objects.

In the “transfer between threads/processes” use case, the applicationmay want to use the load=False flag as well to avoid overhead andredundant SQL queries as the data is transferred.

Merge Tips

merge() is an extremely useful method for many purposes. However,it deals with the intricate border between objects that are transient/detached andthose that are persistent, as well as the automated transference of state.The wide variety of scenarios that can present themselves here often require amore careful approach to the state of objects. Common problems with merge usually involvesome unexpected state regarding the object being passed to merge().

Lets use the canonical example of the User and Address objects:

  1. class User(Base):
  2. __tablename__ = 'user'
  3.  
  4. id = Column(Integer, primary_key=True)
  5. name = Column(String(50), nullable=False)
  6. addresses = relationship("Address", backref="user")
  7.  
  8. class Address(Base):
  9. __tablename__ = 'address'
  10.  
  11. id = Column(Integer, primary_key=True)
  12. email_address = Column(String(50), nullable=False)
  13. user_id = Column(Integer, ForeignKey('user.id'), nullable=False)

Assume a User object with one Address, already persistent:

  1. >>> u1 = User(name='ed', addresses=[Address(email_address='ed@ed.com')])
  2. >>> session.add(u1)
  3. >>> session.commit()

We now create a1, an object outside the session, which we’d liketo merge on top of the existing Address:

  1. >>> existing_a1 = u1.addresses[0]
  2. >>> a1 = Address(id=existing_a1.id)

A surprise would occur if we said this:

  1. >>> a1.user = u1
  2. >>> a1 = session.merge(a1)
  3. >>> session.commit()
  4. sqlalchemy.orm.exc.FlushError: New instance <Address at 0x1298f50>
  5. with identity key (<class '__main__.Address'>, (1,)) conflicts with
  6. persistent instance <Address at 0x12a25d0>

Why is that ? We weren’t careful with our cascades. The assignmentof a1.user to a persistent object cascaded to the backref of User.addressesand made our a1 object pending, as though we had added it. Now we havetwo Address objects in the session:

  1. >>> a1 = Address()
  2. >>> a1.user = u1
  3. >>> a1 in session
  4. True
  5. >>> existing_a1 in session
  6. True
  7. >>> a1 is existing_a1
  8. False

Above, our a1 is already pending in the session. Thesubsequent merge() operation essentiallydoes nothing. Cascade can be configured via the cascadeoption on relationship(), although in this case itwould mean removing the save-update cascade from theUser.addresses relationship - and usually, that behavioris extremely convenient. The solution here would usually be to not assigna1.user to an object already persistent in the targetsession.

The cascade_backrefs=False option of relationship()will also prevent the Address frombeing added to the session via the a1.user = u1 assignment.

Further detail on cascade operation is at Cascades.

Another example of unexpected state:

  1. >>> a1 = Address(id=existing_a1.id, user_id=u1.id)
  2. >>> assert a1.user is None
  3. True
  4. >>> a1 = session.merge(a1)
  5. >>> session.commit()
  6. sqlalchemy.exc.IntegrityError: (IntegrityError) address.user_id
  7. may not be NULL

Here, we accessed a1.user, which returned its default valueof None, which as a result of this access, has been placed in the dict ofour object a1. Normally, this operation creates no change event,so the user_id attribute takes precedence during aflush. But when we merge the Address object into the session, the operationis equivalent to:

  1. >>> existing_a1.id = existing_a1.id
  2. >>> existing_a1.user_id = u1.id
  3. >>> existing_a1.user = None

Where above, both user_id and user are assigned to, and change eventsare emitted for both. The user associationtakes precedence, and None is applied to user_id, causing a failure.

Most merge() issues can be examined by first checking -is the object prematurely in the session ?

  1. >>> a1 = Address(id=existing_a1, user_id=user.id)
  2. >>> assert a1 not in session
  3. >>> a1 = session.merge(a1)

Or is there state on the object that we don’t want ? Examining dictis a quick way to check:

  1. >>> a1 = Address(id=existing_a1, user_id=user.id)
  2. >>> a1.user
  3. >>> a1.__dict__
  4. {'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x1298d10>,
  5. 'user_id': 1,
  6. 'id': 1,
  7. 'user': None}
  8. >>> # we don't want user=None merged, remove it
  9. >>> del a1.user
  10. >>> a1 = session.merge(a1)
  11. >>> # success
  12. >>> session.commit()

Expunging

Expunge removes an object from the Session, sending persistent instances tothe detached state, and pending instances to the transient state:

  1. session.expunge(obj1)

To remove all items, call expunge_all()(this method was formerly known as clear()).

Refreshing / Expiring

Expiring means that the database-persisted data held inside a seriesof object attributes is erased, in such a way that when those attributesare next accessed, a SQL query is emitted which will refresh that data fromthe database.

When we talk about expiration of data we are usually talking about an objectthat is in the persistent state. For example, if we load an objectas follows:

  1. user = session.query(User).filter_by(name='user1').first()

The above User object is persistent, and has a series of attributespresent; if we were to look inside its dict, we’d see that stateloaded:

  1. >>> user.__dict__
  2. {
  3. 'id': 1, 'name': u'user1',
  4. '_sa_instance_state': <...>,
  5. }

where id and name refer to those columns in the database._sa_instance_state is a non-database-persisted value used by SQLAlchemyinternally (it refers to the InstanceState for the instance.While not directly relevant to this section, if we want to get at it,we should use the inspect() function to access it).

At this point, the state in our User object matches that of the loadeddatabase row. But upon expiring the object using a method such asSession.expire(), we see that the state is removed:

  1. >>> session.expire(user)
  2. >>> user.__dict__
  3. {'_sa_instance_state': <...>}

We see that while the internal “state” still hangs around, the values whichcorrespond to the id and name columns are gone. If we were to accessone of these columns and are watching SQL, we’d see this:

  1. >>> print(user.name)
  2. SELECT user.id AS user_id, user.name AS user_name
  3. FROM user
  4. WHERE user.id = ?
  5. (1,)
  6. user1

Above, upon accessing the expired attribute user.name, the ORM initiateda lazy load to retrieve the most recent state from the database,by emitting a SELECT for the user row to which this user refers. Afterwards,the dict is again populated:

  1. >>> user.__dict__
  2. {
  3. 'id': 1, 'name': u'user1',
  4. '_sa_instance_state': <...>,
  5. }

Note

While we are peeking inside of dict in order to see a bitof what SQLAlchemy does with object attributes, we should not modifythe contents of dict directly, at least as far as those attributeswhich the SQLAlchemy ORM is maintaining (other attributes outside of SQLA’srealm are fine). This is because SQLAlchemy uses descriptors inorder to track the changes we make to an object, and when we modify dictdirectly, the ORM won’t be able to track that we changed something.

Another key behavior of both expire() and refresh()is that all un-flushed changes on an object are discarded. That is,if we were to modify an attribute on our User:

  1. >>> user.name = 'user2'

but then we call expire() without first calling flush(),our pending value of 'user2' is discarded:

  1. >>> session.expire(user)
  2. >>> user.name
  3. 'user1'

The expire() method can be used to mark as “expired” all ORM-mappedattributes for an instance:

  1. # expire all ORM-mapped attributes on obj1
  2. session.expire(obj1)

it can also be passed a list of string attribute names, referring to specificattributes to be marked as expired:

  1. # expire only attributes obj1.attr1, obj1.attr2
  2. session.expire(obj1, ['attr1', 'attr2'])

The refresh() method has a similar interface, but insteadof expiring, it emits an immediate SELECT for the object’s row immediately:

  1. # reload all attributes on obj1
  2. session.refresh(obj1)

refresh() also accepts a list of string attribute names,but unlike expire(), expects at least one name tobe that of a column-mapped attribute:

  1. # reload obj1.attr1, obj1.attr2
  2. session.refresh(obj1, ['attr1', 'attr2'])

The Session.expire_all() method allows us to essentially callSession.expire() on all objects contained within the Sessionat once:

  1. session.expire_all()

What Actually Loads

The SELECT statement that’s emitted when an object marked with expire()or loaded with refresh() varies based on several factors, including:

  • The load of expired attributes is triggered from column-mapped attributes only.While any kind of attribute can be marked as expired, including arelationship() - mapped attribute, accessing an expired relationship()attribute will emit a load only for that attribute, using standardrelationship-oriented lazy loading. Column-oriented attributes, even ifexpired, will not load as part of this operation, and instead will load whenany column-oriented attribute is accessed.

  • relationship()- mapped attributes will not load in response toexpired column-based attributes being accessed.

  • Regarding relationships, refresh() is more restrictive thanexpire() with regards to attributes that aren’t column-mapped.Calling refresh() and passing a list of names that only includesrelationship-mapped attributes will actually raise an error.In any case, non-eager-loading relationship() attributes will not beincluded in any refresh operation.

  • relationship() attributes configured as “eager loading” via thelazy parameter will load in the case ofrefresh(), if either no attribute names are specified, orif their names are included in the list of attributes to berefreshed.

  • Attributes that are configured as deferred() will not normally load,during either the expired-attribute load or during a refresh.An unloaded attribute that’s deferred() instead loads on its own when directlyaccessed, or if part of a “group” of deferred attributes where an unloadedattribute in that group is accessed.

  • For expired attributes that are loaded on access, a joined-inheritance tablemapping will emit a SELECT that typically only includes those tables for whichunloaded attributes are present. The action here is sophisticated enoughto load only the parent or child table, for example, if the subset of columnsthat were originally expired encompass only one or the other of those tables.

  • When refresh() is used on a joined-inheritance table mapping,the SELECT emitted will resemble that of when Session.query() isused on the target object’s class. This is typically all those tables thatare set up as part of the mapping.

When to Expire or Refresh

The Session uses the expiration feature automatically wheneverthe transaction referred to by the session ends. Meaning, whenever Session.commit()or Session.rollback() is called, all objects within the Sessionare expired, using a feature equivalent to that of the Session.expire_all()method. The rationale is that the end of a transaction is ademarcating point at which there is no more context available in order to knowwhat the current state of the database is, as any number of other transactionsmay be affecting it. Only when a new transaction starts can we again have accessto the current state of the database, at which point any number of changesmay have occurred.

Transaction Isolation

Of course, most databases are capable of handlingmultiple transactions at once, even involving the same rows of data. Whena relational database handles multiple transactions involving the sametables or rows, this is when the isolation aspect of the database comesinto play. The isolation behavior of different databases varies considerablyand even on a single database can be configured to behave in different ways(via the so-called isolation level setting). In that sense, the Sessioncan’t fully predict when the same SELECT statement, emitted a second time,will definitely return the data we already have, or will return new data.So as a best guess, it assumes that within the scope of a transaction, unlessit is known that a SQL expression has been emitted to modify a particular row,there’s no need to refresh a row unless explicitly told to do so.

The Session.expire() and Session.refresh() methods are used inthose cases when one wants to force an object to re-load its data from thedatabase, in those cases when it is known that the current state of datais possibly stale. Reasons for this might include:

  • some SQL has been emitted within the transaction outside of thescope of the ORM’s object handling, such as if a Table.update() constructwere emitted using the Session.execute() method;

  • if the applicationis attempting to acquire data that is known to have been modified in aconcurrent transaction, and it is also known that the isolation rules in effectallow this data to be visible.

The second bullet has the important caveat that “it is also known that the isolation rules in effectallow this data to be visible.” This means that it cannot be assumed that anUPDATE that happened on another database connection will yet be visible herelocally; in many cases, it will not. This is why if one wishes to useexpire() or refresh() in order to view data between ongoingtransactions, an understanding of the isolation behavior in effect is essential.

See also

Session.expire()

Session.expire_all()

Session.refresh()

isolation - glossary explanation of isolation which includes linksto Wikipedia.

The SQLAlchemy Session In-Depth - a video + slides with an in-depth discussion of the objectlifecycle including the role of data expiration.