Serializing Django objects

Django’s serialization framework provides a mechanism for “translating” Djangomodels into other formats. Usually these other formats will be text-based andused for sending Django data over a wire, but it’s possible for aserializer to handle any format (text-based or not).

See also

If you just want to get some data from your tables into a serializedform, you could use the dumpdata management command.

Serializing data

At the highest level, you can serialize data like this:

  1. from django.core import serializers
  2. data = serializers.serialize("xml", SomeModel.objects.all())

The arguments to the serialize function are the format to serialize the datato (see Serialization formats) and aQuerySet to serialize. (Actually, the secondargument can be any iterator that yields Django model instances, but it’llalmost always be a QuerySet).

  • django.core.serializers.getserializer(_format)
  • You can also use a serializer object directly:
  1. XMLSerializer = serializers.get_serializer("xml")
  2. xml_serializer = XMLSerializer()
  3. xml_serializer.serialize(queryset)
  4. data = xml_serializer.getvalue()

This is useful if you want to serialize data directly to a file-like object(which includes an HttpResponse):

  1. with open("file.xml", "w") as out:
  2. xml_serializer.serialize(SomeModel.objects.all(), stream=out)

Note

Calling get_serializer() with an unknownformat will raise adjango.core.serializers.SerializerDoesNotExist exception.

Subset of fields

If you only want a subset of fields to be serialized, you canspecify a fields argument to the serializer:

  1. from django.core import serializers
  2. data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))

In this example, only the name and size attributes of each model willbe serialized. The primary key is always serialized as the pk element in theresulting output; it never appears in the fields part.

Note

Depending on your model, you may find that it is not possible todeserialize a model that only serializes a subset of its fields. If aserialized object doesn’t specify all the fields that are required by amodel, the deserializer will not be able to save deserialized instances.

Inherited models

If you have a model that is defined using an abstract base class, you don’t have to do anything special to serializethat model. Call the serializer on the object (or objects) that you want toserialize, and the output will be a complete representation of the serializedobject.

However, if you have a model that uses multi-table inheritance, you also need to serialize all of the base classesfor the model. This is because only the fields that are locally defined on themodel will be serialized. For example, consider the following models:

  1. class Place(models.Model):
  2. name = models.CharField(max_length=50)
  3.  
  4. class Restaurant(Place):
  5. serves_hot_dogs = models.BooleanField(default=False)

If you only serialize the Restaurant model:

  1. data = serializers.serialize('xml', Restaurant.objects.all())

the fields on the serialized output will only contain the serves_hot_dogsattribute. The name attribute of the base class will be ignored.

In order to fully serialize your Restaurant instances, you will need toserialize the Place models as well:

  1. all_objects = [*Restaurant.objects.all(), *Place.objects.all()]
  2. data = serializers.serialize('xml', all_objects)

Deserializing data

Deserializing data is very similar to serializing it:

  1. for obj in serializers.deserialize("xml", data):
  2. do_something_with(obj)

As you can see, the deserialize function takes the same format argument asserialize, a string or stream of data, and returns an iterator.

However, here it gets slightly complicated. The objects returned by thedeserialize iterator aren’t regular Django objects. Instead, they arespecial DeserializedObject instances that wrap a created – but unsaved –object and any associated relationship data.

Calling DeserializedObject.save() saves the object to the database.

Note

If the pk attribute in the serialized data doesn’t exist or isnull, a new instance will be saved to the database.

This ensures that deserializing is a non-destructive operation even if thedata in your serialized representation doesn’t match what’s currently in thedatabase. Usually, working with these DeserializedObject instances lookssomething like:

  1. for deserialized_object in serializers.deserialize("xml", data):
  2. if object_should_be_saved(deserialized_object):
  3. deserialized_object.save()

In other words, the usual use is to examine the deserialized objects to makesure that they are “appropriate” for saving before doing so. Of course, if youtrust your data source you can instead save the object directly and move on.

The Django object itself can be inspected as deserialized_object.object.If fields in the serialized data do not exist on a model, aDeserializationError will be raised unless the ignorenonexistentargument is passed in as True:

  1. serializers.deserialize("xml", data, ignorenonexistent=True)

Serialization formats

Django supports a number of serialization formats, some of which require youto install third-party Python modules:

IdentifierInformation
xmlSerializes to and from a simple XML dialect.
jsonSerializes to and from JSON.
yamlSerializes to YAML (YAML Ain’t a Markup Language). Thisserializer is only available if PyYAML is installed.

XML

The basic XML serialization format looks like this:

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <django-objects version="1.0">
  3. <object pk="123" model="sessions.session">
  4. <field type="DateTimeField" name="expire_date">2013-01-16T08:16:59.844560+00:00</field>
  5. <!-- ... -->
  6. </object>
  7. </django-objects>

The whole collection of objects that is either serialized or deserialized isrepresented by a <django-objects>-tag which contains multiple<object>-elements. Each such object has two attributes: “pk” and “model”,the latter being represented by the name of the app (“sessions”) and thelowercase name of the model (“session”) separated by a dot.

Each field of the object is serialized as a <field>-element sporting thefields “type” and “name”. The text content of the element represents the valuethat should be stored.

Foreign keys and other relational fields are treated a little bit differently:

  1. <object pk="27" model="auth.permission">
  2. <!-- ... -->
  3. <field to="contenttypes.contenttype" name="content_type" rel="ManyToOneRel">9</field>
  4. <!-- ... -->
  5. </object>

In this example we specify that the auth.Permission object with the PK 27has a foreign key to the contenttypes.ContentType instance with the PK 9.

ManyToMany-relations are exported for the model that binds them. For instance,the auth.User model has such a relation to the auth.Permission model:

  1. <object pk="1" model="auth.user">
  2. <!-- ... -->
  3. <field to="auth.permission" name="user_permissions" rel="ManyToManyRel">
  4. <object pk="46"></object>
  5. <object pk="47"></object>
  6. </field>
  7. </object>

This example links the given user with the permission models with PKs 46 and 47.

Control characters

If the content to be serialized contains control characters that are notaccepted in the XML 1.0 standard, the serialization will fail with aValueError exception. Read also the W3C’s explanation of HTML,XHTML, XML and Control Codes.

JSON

When staying with the same example data as before it would be serialized asJSON in the following way:

  1. [
  2. {
  3. "pk": "4b678b301dfd8a4e0dad910de3ae245b",
  4. "model": "sessions.session",
  5. "fields": {
  6. "expire_date": "2013-01-16T08:16:59.844Z",
  7. ...
  8. }
  9. }
  10. ]

The formatting here is a bit simpler than with XML. The whole collectionis just represented as an array and the objects are represented by JSON objectswith three properties: “pk”, “model” and “fields”. “fields” is again an objectcontaining each field’s name and value as property and property-valuerespectively.

Foreign keys have the PK of the linked object as property value.ManyToMany-relations are serialized for the model that defines them and arerepresented as a list of PKs.

Be aware that not all Django output can be passed unmodified to json.For example, if you have some custom type in an object to be serialized, you’llhave to write a custom json encoder for it. Something like this willwork:

  1. from django.core.serializers.json import DjangoJSONEncoder
  2.  
  3. class LazyEncoder(DjangoJSONEncoder):
  4. def default(self, obj):
  5. if isinstance(obj, YourCustomType):
  6. return str(obj)
  7. return super().default(obj)

You can then pass cls=LazyEncoder to the serializers.serialize()function:

  1. from django.core.serializers import serialize
  2.  
  3. serialize('json', SomeModel.objects.all(), cls=LazyEncoder)

Also note that GeoDjango provides a customized GeoJSON serializer.

DjangoJSONEncoder

  • class django.core.serializers.json.DjangoJSONEncoder
  • The JSON serializer uses DjangoJSONEncoder for encoding. A subclass ofJSONEncoder, it handles these additional types:

  • datetime

  • A string of the form YYYY-MM-DDTHH:mm:ss.sssZ orYYYY-MM-DDTHH:mm:ss.sss+HH:MM as defined in ECMA-262.
  • date
  • A string of the form YYYY-MM-DD as defined in ECMA-262.
  • time
  • A string of the form HH:MM:ss.sss as defined in ECMA-262.
  • timedelta
  • A string representing a duration as defined in ISO-8601. For example,timedelta(days=1, hours=2, seconds=3.4) is represented as'P1DT02H00M03.400000S'.
  • Decimal, Promise (django.utils.functional.lazy() objects), UUID
  • A string representation of the object.

YAML

YAML serialization looks quite similar to JSON. The object list is serializedas a sequence mappings with the keys “pk”, “model” and “fields”. Each field isagain a mapping with the key being name of the field and the value the value:

  1. - fields: {expire_date: !!timestamp '2013-01-16 08:16:59.844560+00:00'}
  2. model: sessions.session
  3. pk: 4b678b301dfd8a4e0dad910de3ae245b

Referential fields are again represented by the PK or sequence of PKs.

Natural keys

The default serialization strategy for foreign keys and many-to-many relationsis to serialize the value of the primary key(s) of the objects in the relation.This strategy works well for most objects, but it can cause difficulty in somecircumstances.

Consider the case of a list of objects that have a foreign key referencingContentType. If you’re going toserialize an object that refers to a content type, then you need to have a wayto refer to that content type to begin with. Since ContentType objects areautomatically created by Django during the database synchronization process,the primary key of a given content type isn’t easy to predict; it willdepend on how and when migrate was executed. This is true for allmodels which automatically generate objects, notably includingPermission,Group, andUser.

Warning

You should never include automatically generated objects in a fixture orother serialized data. By chance, the primary keys in the fixturemay match those in the database and loading the fixture willhave no effect. In the more likely case that they don’t match, the fixtureloading will fail with an IntegrityError.

There is also the matter of convenience. An integer id isn’t alwaysthe most convenient way to refer to an object; sometimes, amore natural reference would be helpful.

It is for these reasons that Django provides natural keys. A naturalkey is a tuple of values that can be used to uniquely identify anobject instance without using the primary key value.

Deserialization of natural keys

Consider the following two models:

  1. from django.db import models
  2.  
  3. class Person(models.Model):
  4. first_name = models.CharField(max_length=100)
  5. last_name = models.CharField(max_length=100)
  6.  
  7. birthdate = models.DateField()
  8.  
  9. class Meta:
  10. unique_together = [['first_name', 'last_name']]
  11.  
  12. class Book(models.Model):
  13. name = models.CharField(max_length=100)
  14. author = models.ForeignKey(Person, on_delete=models.CASCADE)

Ordinarily, serialized data for Book would use an integer to refer tothe author. For example, in JSON, a Book might be serialized as:

  1. ...
  2. {
  3. "pk": 1,
  4. "model": "store.book",
  5. "fields": {
  6. "name": "Mostly Harmless",
  7. "author": 42
  8. }
  9. }
  10. ...

This isn’t a particularly natural way to refer to an author. Itrequires that you know the primary key value for the author; it alsorequires that this primary key value is stable and predictable.

However, if we add natural key handling to Person, the fixture becomesmuch more humane. To add natural key handling, you define a defaultManager for Person with a get_by_natural_key() method. In the caseof a Person, a good natural key might be the pair of first and lastname:

  1. from django.db import models
  2.  
  3. class PersonManager(models.Manager):
  4. def get_by_natural_key(self, first_name, last_name):
  5. return self.get(first_name=first_name, last_name=last_name)
  6.  
  7. class Person(models.Model):
  8. first_name = models.CharField(max_length=100)
  9. last_name = models.CharField(max_length=100)
  10. birthdate = models.DateField()
  11.  
  12. objects = PersonManager()
  13.  
  14. class Meta:
  15. unique_together = [['first_name', 'last_name']]

Now books can use that natural key to refer to Person objects:

  1. ...
  2. {
  3. "pk": 1,
  4. "model": "store.book",
  5. "fields": {
  6. "name": "Mostly Harmless",
  7. "author": ["Douglas", "Adams"]
  8. }
  9. }
  10. ...

When you try to load this serialized data, Django will use theget_by_natural_key() method to resolve ["Douglas", "Adams"]into the primary key of an actual Person object.

Note

Whatever fields you use for a natural key must be able to uniquelyidentify an object. This will usually mean that your model willhave a uniqueness clause (either unique=True on a single field, orunique_together over multiple fields) for the field or fieldsin your natural key. However, uniqueness doesn’t need to beenforced at the database level. If you are certain that a set offields will be effectively unique, you can still use those fieldsas a natural key.

Deserialization of objects with no primary key will always check whether themodel’s manager has a get_by_natural_key() method and if so, use it topopulate the deserialized object’s primary key.

Serialization of natural keys

So how do you get Django to emit a natural key when serializing an object?Firstly, you need to add another method – this time to the model itself:

  1. class Person(models.Model):
  2. first_name = models.CharField(max_length=100)
  3. last_name = models.CharField(max_length=100)
  4. birthdate = models.DateField()
  5.  
  6. objects = PersonManager()
  7.  
  8. class Meta:
  9. unique_together = [['first_name', 'last_name']]
  10.  
  11. def natural_key(self):
  12. return (self.first_name, self.last_name)

That method should always return a natural key tuple – in thisexample, (first name, last name). Then, when you callserializers.serialize(), you provide use_natural_foreign_keys=Trueor use_natural_primary_keys=True arguments:

  1. >>> serializers.serialize('json', [book1, book2], indent=2,
  2. ... use_natural_foreign_keys=True, use_natural_primary_keys=True)

When use_natural_foreign_keys=True is specified, Django will use thenatural_key() method to serialize any foreign key reference to objectsof the type that defines the method.

When use_natural_primary_keys=True is specified, Django will not provide theprimary key in the serialized data of this object since it can be calculatedduring deserialization:

  1. ...
  2. {
  3. "model": "store.person",
  4. "fields": {
  5. "first_name": "Douglas",
  6. "last_name": "Adams",
  7. "birth_date": "1952-03-11",
  8. }
  9. }
  10. ...

This can be useful when you need to load serialized data into an existingdatabase and you cannot guarantee that the serialized primary key value is notalready in use, and do not need to ensure that deserialized objects retain thesame primary keys.

If you are using dumpdata to generate serialized data, use thedumpdata —natural-foreign and dumpdata —natural-primarycommand line flags to generate natural keys.

Note

You don’t need to define both natural_key() andget_by_natural_key(). If you don’t want Django to outputnatural keys during serialization, but you want to retain theability to load natural keys, then you can opt to not implementthe natural_key() method.

Conversely, if (for some strange reason) you want Django to outputnatural keys during serialization, but not be able to load thosekey values, just don’t define the get_by_natural_key() method.

Natural keys and forward references

New in Django 2.2:

Sometimes when you use natural foreign keys you’ll need to deserialize data wherean object has a foreign key referencing another object that hasn’t yet beendeserialized. This is called a “forward reference”.

For instance, suppose you have the following objects in your fixture:

  1. ...
  2. {
  3. "model": "store.book",
  4. "fields": {
  5. "name": "Mostly Harmless",
  6. "author": ["Douglas", "Adams"]
  7. }
  8. },
  9. ...
  10. {
  11. "model": "store.person",
  12. "fields": {
  13. "first_name": "Douglas",
  14. "last_name": "Adams"
  15. }
  16. },
  17. ...

In order to handle this situation, you need to passhandle_forward_references=True to serializers.deserialize(). This willset the deferred_fields attribute on the DeserializedObject instances.You’ll need to keep track of DeserializedObject instances where thisattribute isn’t None and later call save_deferred_fields() on them.

Typical usage looks like this:

  1. objs_with_deferred_fields = []
  2.  
  3. for obj in serializers.deserialize('xml', data, handle_forward_references=True):
  4. obj.save()
  5. if obj.deferred_fields is not None:
  6. objs_with_deferred_fields.append(obj)
  7.  
  8. for obj in objs_with_deferred_fields:
  9. obj.save_deferred_fields()

For this to work, the ForeignKey on the referencing model must havenull=True.

Dependencies during serialization

It’s often possible to avoid explicitly having to handle forward references bytaking care with the ordering of objects within a fixture.

To help with this, calls to dumpdata that use the dumpdata—natural-foreign option will serialize any model with a natural_key()method before serializing standard primary key objects.

However, this may not always be enough. If your natural key refers toanother object (by using a foreign key or natural key to another objectas part of a natural key), then you need to be able to ensure thatthe objects on which a natural key depends occur in the serialized databefore the natural key requires them.

To control this ordering, you can define dependencies on yournatural_key() methods. You do this by setting a dependenciesattribute on the natural_key() method itself.

For example, let’s add a natural key to the Book model from theexample above:

  1. class Book(models.Model):
  2. name = models.CharField(max_length=100)
  3. author = models.ForeignKey(Person, on_delete=models.CASCADE)
  4.  
  5. def natural_key(self):
  6. return (self.name,) + self.author.natural_key()

The natural key for a Book is a combination of its name and itsauthor. This means that Person must be serialized before Book.To define this dependency, we add one extra line:

  1. def natural_key(self):
  2. return (self.name,) + self.author.natural_key()
  3. natural_key.dependencies = ['example_app.person']

This definition ensures that all Person objects are serialized beforeany Book objects. In turn, any object referencing Book will beserialized after both Person and Book have been serialized.