Bulk Write Operations

This tutorial explains how to take advantage of PyMongo’s bulkwrite operation features. Executing write operations in batchesreduces the number of network round trips, increasing writethroughput.

Bulk Insert

New in version 2.6.

A batch of documents can be inserted by passing a list to theinsert_many() method. PyMongowill automatically split the batch into smaller sub-batches based onthe maximum message size accepted by MongoDB, supporting very largebulk insert operations.

  1. >>> import pymongo
  2. >>> db = pymongo.MongoClient().bulk_example
  3. >>> db.test.insert_many([{'i': i} for i in range(10000)]).inserted_ids
  4. [...]
  5. >>> db.test.count_documents({})
  6. 10000

Mixed Bulk Write Operations

New in version 2.7.

PyMongo also supports executing mixed bulk write operations. A batchof insert, update, and remove operations can be executed together usingthe bulk write operations API.

Ordered Bulk Write Operations

Ordered bulk write operations are batched and sent to the server in theorder provided for serial execution. The return value is an instance ofBulkWriteResult describing the type and countof operations performed.

  1. >>> from pprint import pprint
  2. >>> from pymongo import InsertOne, DeleteMany, ReplaceOne, UpdateOne
  3. >>> result = db.test.bulk_write([
  4. ... DeleteMany({}), # Remove all documents from the previous example.
  5. ... InsertOne({'_id': 1}),
  6. ... InsertOne({'_id': 2}),
  7. ... InsertOne({'_id': 3}),
  8. ... UpdateOne({'_id': 1}, {'$set': {'foo': 'bar'}}),
  9. ... UpdateOne({'_id': 4}, {'$inc': {'j': 1}}, upsert=True),
  10. ... ReplaceOne({'j': 1}, {'j': 2})])
  11. >>> pprint(result.bulk_api_result)
  12. {'nInserted': 3,
  13. 'nMatched': 2,
  14. 'nModified': 2,
  15. 'nRemoved': 10000,
  16. 'nUpserted': 1,
  17. 'upserted': [{u'_id': 4, u'index': 5}],
  18. 'writeConcernErrors': [],
  19. 'writeErrors': []}

Warning

nModified is only reported by MongoDB 2.6 and later. Whenconnected to an earlier server version, or in certain mixed version shardingconfigurations, PyMongo omits this field from the results of a bulkwrite operation.

The first write failure that occurs (e.g. duplicate key error) aborts theremaining operations, and PyMongo raisesBulkWriteError. The details attibute ofthe exception instance provides the execution results up until the failureoccurred and details about the failure - including the operation that causedthe failure.

  1. >>> from pymongo import InsertOne, DeleteOne, ReplaceOne
  2. >>> from pymongo.errors import BulkWriteError
  3. >>> requests = [
  4. ... ReplaceOne({'j': 2}, {'i': 5}),
  5. ... InsertOne({'_id': 4}), # Violates the unique key constraint on _id.
  6. ... DeleteOne({'i': 5})]
  7. >>> try:
  8. ... db.test.bulk_write(requests)
  9. ... except BulkWriteError as bwe:
  10. ... pprint(bwe.details)
  11. ...
  12. {'nInserted': 0,
  13. 'nMatched': 1,
  14. 'nModified': 1,
  15. 'nRemoved': 0,
  16. 'nUpserted': 0,
  17. 'upserted': [],
  18. 'writeConcernErrors': [],
  19. 'writeErrors': [{u'code': 11000,
  20. u'errmsg': u'...E11000...duplicate key error...',
  21. u'index': 1,...
  22. u'op': {'_id': 4}}]}

Unordered Bulk Write Operations

Unordered bulk write operations are batched and sent to the server inarbitrary order where they may be executed in parallel. Any errorsthat occur are reported after all operations are attempted.

In the next example the first and third operations fail due to the uniqueconstraint on _id. Since we are doing unordered execution the secondand fourth operations succeed.

  1. >>> requests = [
  2. ... InsertOne({'_id': 1}),
  3. ... DeleteOne({'_id': 2}),
  4. ... InsertOne({'_id': 3}),
  5. ... ReplaceOne({'_id': 4}, {'i': 1})]
  6. >>> try:
  7. ... db.test.bulk_write(requests, ordered=False)
  8. ... except BulkWriteError as bwe:
  9. ... pprint(bwe.details)
  10. ...
  11. {'nInserted': 0,
  12. 'nMatched': 1,
  13. 'nModified': 1,
  14. 'nRemoved': 1,
  15. 'nUpserted': 0,
  16. 'upserted': [],
  17. 'writeConcernErrors': [],
  18. 'writeErrors': [{u'code': 11000,
  19. u'errmsg': u'...E11000...duplicate key error...',
  20. u'index': 0,...
  21. u'op': {'_id': 1}},
  22. {u'code': 11000,
  23. u'errmsg': u'...E11000...duplicate key error...',
  24. u'index': 2,...
  25. u'op': {'_id': 3}}]}

Write Concern

Bulk operations are executed with thewrite_concern of the collection theyare executed against. Write concern errors (e.g. wtimeout) will be reportedafter all operations are attempted, regardless of execution order.

  • ::
  1. >>> from pymongo import WriteConcern
  2. >>> coll = db.get_collection(
  3. ... 'test', write_concern=WriteConcern(w=3, wtimeout=1))
  4. >>> try:
  5. ... coll.bulk_write([InsertOne({'a': i}) for i in range(4)])
  6. ... except BulkWriteError as bwe:
  7. ... pprint(bwe.details)
  8. ...
  9. {'nInserted': 4,
  10. 'nMatched': 0,
  11. 'nModified': 0,
  12. 'nRemoved': 0,
  13. 'nUpserted': 0,
  14. 'upserted': [],
  15. 'writeConcernErrors': [{u'code': 64...
  16. u'errInfo': {u'wtimeout': True},
  17. u'errmsg': u'waiting for replication timed out'}],
  18. 'writeErrors': []}