Custom Type Example

This is an example of using a custom type with PyMongo. The example here showshow to subclass TypeCodec to write a typecodec, which is used to populate a TypeRegistry.The type registry can then be used to create a custom-type-awareCollection. Read and write operationsissued against the resulting collection object transparently manipulatedocuments as they are saved to or retrieved from MongoDB.

Setting Up

We’ll start by getting a clean database to use for the example:

  1. >>> from pymongo import MongoClient
  2. >>> client = MongoClient()
  3. >>> client.drop_database('custom_type_example')
  4. >>> db = client.custom_type_example

Since the purpose of the example is to demonstrate working with custom types,we’ll need a custom data type to use. For this example, we will be working withthe Decimal type from Python’s standard library. Since theBSON library’s Decimal128 type (that implementsthe IEEE 754 decimal128 decimal-based floating-point numbering format) isdistinct from Python’s built-in Decimal type, attemptingto save an instance of Decimal with PyMongo, results in anInvalidDocument exception.

  1. >>> from decimal import Decimal
  2. >>> num = Decimal("45.321")
  3. >>> db.test.insert_one({'num': num})
  4. Traceback (most recent call last):
  5. ...
  6. bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>

The TypeCodec Class

New in version 3.8.

In order to encode a custom type, we must first define a type codec forthat type. A type codec describes how an instance of a custom type can betransformed to and/or from one of the types bson already understands.Depending on the desired functionality, users must choose from the followingbase classes when defining type codecs:

  • TypeEncoder: subclass this to define a codec thatencodes a custom Python type to a known BSON type. Users must implement thepython_type property/attribute and the transform_python method.
  • TypeDecoder: subclass this to define a codec thatdecodes a specified BSON type into a custom Python type. Users must implementthe bson_type property/attribute and the transform_bson method.
  • TypeCodec: subclass this to define a codec thatcan both encode and decode a custom type. Users must implement thepython_type and bson_type properties/attributes, as well as thetransform_python and transform_bson methods.

The type codec for our custom type simply needs to define how aDecimal instance can be converted into aDecimal128 instance and vice-versa. Since we areinterested in both encoding and decoding our custom type, we use theTypeCodec base class to define our codec:

  1. >>> from bson.decimal128 import Decimal128
  2. >>> from bson.codec_options import TypeCodec
  3. >>> class DecimalCodec(TypeCodec):
  4. ... python_type = Decimal # the Python type acted upon by this type codec
  5. ... bson_type = Decimal128 # the BSON type acted upon by this type codec
  6. ... def transform_python(self, value):
  7. ... """Function that transforms a custom type value into a type
  8. ... that BSON can encode."""
  9. ... return Decimal128(value)
  10. ... def transform_bson(self, value):
  11. ... """Function that transforms a vanilla BSON type value into our
  12. ... custom type."""
  13. ... return value.to_decimal()
  14. >>> decimal_codec = DecimalCodec()

The TypeRegistry Class

New in version 3.8.

Before we can begin encoding and decoding our custom type objects, we mustfirst inform PyMongo about the corresponding codec. This is done by creatinga TypeRegistry instance:

  1. >>> from bson.codec_options import TypeRegistry
  2. >>> type_registry = TypeRegistry([decimal_codec])

Note that type registries can be instantiated with any number of type codecs.Once instantiated, registries are immutable and the only way to add codecsto a registry is to create a new one.

Putting It Together

Finally, we can define a CodecOptions instancewith our type_registry and use it to get aCollection object that understands theDecimal data type:

  1. >>> from bson.codec_options import CodecOptions
  2. >>> codec_options = CodecOptions(type_registry=type_registry)
  3. >>> collection = db.get_collection('test', codec_options=codec_options)

Now, we can seamlessly encode and decode instances ofDecimal:

  1. >>> collection.insert_one({'num': Decimal("45.321")})
  2. <pymongo.results.InsertOneResult object at ...>
  3. >>> mydoc = collection.find_one()
  4. >>> import pprint
  5. >>> pprint.pprint(mydoc)
  6. {u'_id': ObjectId('...'), u'num': Decimal('45.321')}

We can see what’s actually being saved to the database by creating a freshcollection object without the customized codec options and using that to queryMongoDB:

  1. >>> vanilla_collection = db.get_collection('test')
  2. >>> pprint.pprint(vanilla_collection.find_one())
  3. {u'_id': ObjectId('...'), u'num': Decimal128('45.321')}

Encoding Subtypes

Consider the situation where, in addition to encodingDecimal, we also need to encode a type that subclassesDecimal. PyMongo does this automatically for types that inherit fromPython types that are BSON-encodable by default, but the type codec systemdescribed above does not offer the same flexibility.

Consider this subtype of Decimal that has a method to return its value asan integer:

  1. >>> class DecimalInt(Decimal):
  2. ... def my_method(self):
  3. ... """Method implementing some custom logic."""
  4. ... return int(self)

If we try to save an instance of this type without first registering a typecodec for it, we get an error:

  1. >>> collection.insert_one({'num': DecimalInt("45.321")})
  2. Traceback (most recent call last):
  3. ...
  4. bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>

In order to proceed further, we must define a type codec for DecimalInt.This is trivial to do since the same transformation as the one used forDecimal is adequate for encoding DecimalInt as well:

  1. >>> class DecimalIntCodec(DecimalCodec):
  2. ... @property
  3. ... def python_type(self):
  4. ... """The Python type acted upon by this type codec."""
  5. ... return DecimalInt
  6. >>> decimalint_codec = DecimalIntCodec()

Note

No attempt is made to modify decoding behavior because without additionalinformation, it is impossible to discern which incomingDecimal128 value needs to be decoded as Decimaland which needs to be decoded as DecimalInt. This example only considersthe situation where a user wants to encode documents containing eitherof these types.

After creating a new codec options object and using it to get a collectionobject, we can seamlessly encode instances of DecimalInt:

  1. >>> type_registry = TypeRegistry([decimal_codec, decimalint_codec])
  2. >>> codec_options = CodecOptions(type_registry=type_registry)
  3. >>> collection = db.get_collection('test', codec_options=codec_options)
  4. >>> collection.drop()
  5. >>> collection.insert_one({'num': DecimalInt("45.321")})
  6. <pymongo.results.InsertOneResult object at ...>
  7. >>> mydoc = collection.find_one()
  8. >>> pprint.pprint(mydoc)
  9. {u'_id': ObjectId('...'), u'num': Decimal('45.321')}

Note that the transform_bson method of the base codec class results inthese values being decoded as Decimal (and not DecimalInt).

Decoding Binary Types

The decoding treatment of Binary types havingsubtype = 0 by the bson module varies slightly depending on theversion of the Python runtime in use. This must be taken into account whilewriting a TypeDecoder that modifies how this datatype is decoded.

On Python 3.x, Binary data (subtype = 0) is decodedas a bytes instance:

  1. >>> # On Python 3.x.
  2. >>> from bson.binary import Binary
  3. >>> newcoll = db.get_collection('new')
  4. >>> newcoll.insert_one({'_id': 1, 'data': Binary(b"123", subtype=0)})
  5. >>> doc = newcoll.find_one()
  6. >>> type(doc['data'])
  7. bytes

On Python 2.7.x, the same data is decoded as a Binaryinstance:

  1. >>> # On Python 2.7.x
  2. >>> newcoll = db.get_collection('new')
  3. >>> doc = newcoll.find_one()
  4. >>> type(doc['data'])
  5. bson.binary.Binary

As a consequence of this disparity, users must set the bson_type attributeon their TypeDecoder classes differently,depending on the python version in use.

Note

For codebases requiring compatibility with both Python 2 and 3, typedecoders will have to be registered for both possible bson_type values.

The fallback_encoder Callable

New in version 3.8.

In addition to type codecs, users can also register a callable to encode typesthat BSON doesn’t recognize and for which no type codec has been registered.This callable is the fallback encoder and like the transform_pythonmethod, it accepts an unencodable value as a parameter and returns aBSON-encodable value. The following fallback encoder encodes python’sDecimal type to a Decimal128:

  1. >>> def fallback_encoder(value):
  2. ... if isinstance(value, Decimal):
  3. ... return Decimal128(value)
  4. ... return value

After declaring the callback, we must create a type registry and codec optionswith this fallback encoder before it can be used for initializing a collection:

  1. >>> type_registry = TypeRegistry(fallback_encoder=fallback_encoder)
  2. >>> codec_options = CodecOptions(type_registry=type_registry)
  3. >>> collection = db.get_collection('test', codec_options=codec_options)
  4. >>> collection.drop()

We can now seamlessly encode instances of Decimal:

  1. >>> collection.insert_one({'num': Decimal("45.321")})
  2. <pymongo.results.InsertOneResult object at ...>
  3. >>> mydoc = collection.find_one()
  4. >>> pprint.pprint(mydoc)
  5. {u'_id': ObjectId('...'), u'num': Decimal128('45.321')}

Note

Fallback encoders are invoked after attempts to encode the given valuewith standard BSON encoders and any configured type encoders have failed.Therefore, in a type registry configured with a type encoder and fallbackencoder that both target the same custom type, the behavior specified inthe type encoder will prevail.

Because fallback encoders don’t need to declare the types that they encodebeforehand, they can be used to support interesting use-cases that cannot beserviced by TypeEncoder. One such use-case is described in the nextsection.

Encoding Unknown Types

In this example, we demonstrate how a fallback encoder can be used to savearbitrary objects to the database. We will use the the standard library’spickle module to serialize the unknown types and so naturally, thisapproach only works for types that are picklable.

We start by defining some arbitrary custom types:

  1. class MyStringType(object):
  2. def __init__(self, value):
  3. self.__value = value
  4. def __repr__(self):
  5. return "MyStringType('%s')" % (self.__value,)
  6.  
  7. class MyNumberType(object):
  8. def __init__(self, value):
  9. self.__value = value
  10. def __repr__(self):
  11. return "MyNumberType(%s)" % (self.__value,)

We also define a fallback encoder that pickles whatever objects it receivesand returns them as Binary instances with a customsubtype. The custom subtype, in turn, allows us to write a TypeDecoder thatidentifies pickled artifacts upon retrieval and transparently decodes themback into Python objects:

  1. import pickle
  2. from bson.binary import Binary, USER_DEFINED_SUBTYPE
  3. def fallback_pickle_encoder(value):
  4. return Binary(pickle.dumps(value), USER_DEFINED_SUBTYPE)
  5.  
  6. class PickledBinaryDecoder(TypeDecoder):
  7. bson_type = Binary
  8. def transform_bson(self, value):
  9. if value.subtype == USER_DEFINED_SUBTYPE:
  10. return pickle.loads(value)
  11. return value

Note

The above example is written assuming the use of Python 3. If you are usingPython 2, bson_type must be set to Binary. See theDecoding Binary Types section for a detailed explanation.

Finally, we create a CodecOptions instance:

  1. codec_options = CodecOptions(type_registry=TypeRegistry(
  2. [PickledBinaryDecoder()], fallback_encoder=fallback_pickle_encoder))

We can now round trip our custom objects to MongoDB:

  1. collection = db.get_collection('test_fe', codec_options=codec_options)
  2. collection.insert_one({'_id': 1, 'str': MyStringType("hello world"),
  3. 'num': MyNumberType(2)})
  4. mydoc = collection.find_one()
  5. assert isinstance(mydoc['str'], MyStringType)
  6. assert isinstance(mydoc['num'], MyNumberType)

Limitations

PyMongo’s type codec and fallback encoder features have the followinglimitations:

  • Users cannot customize the encoding behavior of Python types that PyMongoalready understands like int and str (the ‘built-in types’).Attempting to instantiate a type registry with one or more codecs that actupon a built-in type results in a TypeError. This limitation extendsto all subtypes of the standard types.
  • Chaining type encoders is not supported. A custom type value, oncetransformed by a codec’s transformpython method, _must result in atype that is either BSON-encodable by default, or can betransformed by the fallback encoder into something BSON-encodable–itcannot be transformed a second time by a different type codec.
  • The command() method does not apply theuser’s TypeDecoders while decoding the command response document.
  • gridfs does not apply custom type encoding or decoding to anydocuments received from or to returned to the user.