Using Change Streams with Amazon DocumentDB

The change streams feature in Amazon DocumentDB (with MongoDB compatibility) provides a time-ordered sequence of change events that occur within your cluster’s collections. You can read events from a change stream to implement many different use cases, including the following:

  • Change notification

  • Full-text search with Amazon Elasticsearch Service (Amazon ES)

  • Analytics with Amazon Redshift

Applications can use change streams to subscribe to data changes on individual collections. Change streams events are ordered as they occur on the cluster and are stored for 3 hours (by default) after the event has been recorded. The retention period can be extended up to 7 days using the change_stream_log_retention_duration parameter. To modify the change stream retention period, please see Modifying the Change Stream Log Retention Duration .

Supported Operations

Amazon DocumentDB supports the following operations for change streams:

  • All change events supported in the MongoDB db.collection.watch(), db.watch() and client.watch() API.

  • Full document lookup for updates.

  • Aggregation stages: $match, $project, $redact, and $addFieldsand $replaceRoot.

  • Resuming a change stream from a resume token

  • Resuming a change stream from a timestamp using startAtOperation (applicable to Amazon DocumentDB v4.0+)

Billing

The Amazon DocumentDB change streams feature is disabled by default and does not incur any additional charges until the feature is enabled. Using change streams in a cluster incurs additional read and write IOs and storage costs. You can use the modifyChangeStreams API operation to enable this feature for your cluster. For more information on pricing, see Amazon DocumentDB pricing.

Limitations

Change streams have the following limitations in Amazon DocumentDB:

  • Change streams can only be opened from a connection to the primary instance of an Amazon DocumentDB cluster. Reading from change streams on a replica instance is not currently supported. When invoking the watch() API operation, you must specify a primary read preference to ensure that all reads are directed to the primary instance (see the Example section).

  • Events written to a change stream for a collection are available for up to 7 days (the default is 3 hours). Change streams data is deleted after the log retention duration window, even if no new changes have occurred.

  • A long-running write operation on a collection like updateMany or deleteMany can temporarily stall the writing of change streams events until the long running write operation is complete.

  • Amazon DocumentDB does not support the MongoDB operations log (oplog).

  • With Amazon DocumentDB, you must explicitly enable change streams on a given collection.

  • If the total size of a change streams event (including the change data and full document, if requested) is greater than 16 MB, the client will experience a read failure on the change streams.

  • The Ruby driver is currently not supported when using db.watch() and client.watch() with Amazon DocumentDB v3.6.

Enabling Change Streams

You can enable Amazon DocumentDB change streams for all collections within a given database, or only for selected collections. The following are examples of how to enable change streams for different use cases using the mongo shell. Empty strings are treated as wildcards when specifying database and collection names.

  1. //Enable change streams for the collection "foo" in database "bar"
  2. db.adminCommand({modifyChangeStreams: 1,
  3. database: "bar",
  4. collection: "foo",
  5. enable: true});
  1. //Disable change streams on collection "foo" in database "bar"
  2. db.adminCommand({modifyChangeStreams: 1,
  3. database: "bar",
  4. collection: "foo",
  5. enable: false});
  1. //Enable change streams for all collections in database "bar"
  2. db.adminCommand({modifyChangeStreams: 1,
  3. database: "bar",
  4. collection: "",
  5. enable: true});
  1. //Enable change streams for all collections in all databases in a cluster
  2. db.adminCommand({modifyChangeStreams: 1,
  3. database: "",
  4. collection: "",
  5. enable: true});

Change streams will be enabled for a collection if any of the following are true:

  • Both the database and collection are explicitly enabled.

  • The database containing the collection is enabled.

  • All databases are enabled.

Dropping a collection from a database does not disable change streams for that collection if the parent database also has change streams enabled, or if all databases in the cluster are enabled. If a new collection is created with the same name as the deleted collection, change streams will be enabled for that collection.

You can list all of your cluster’s enabled change streams by using the $listChangeStreams aggregation pipeline stage. All aggregation stages supported by Amazon DocumentDB can be used in the pipeline for additional processing. If a previously enabled collection has been disabled, it will not appear in the $listChangeStreams output.

  1. //List all databases and collections with change streams enabled
  2. cursor = new DBCommandCursor(db,
  3. db.runCommand(
  4. {aggregate: 1,
  5. pipeline: [{$listChangeStreams: 1}],
  6. cursor:{}}));
  1. //List of all databases and collections with change streams enabled
  2. { "database" : "test", "collection" : "foo" }
  3. { "database" : "bar", "collection" : "" }
  4. { "database" : "", "collection" : "" }
  1. //Determine if the database “bar” or collection “bar.foo” have change streams enabled
  2. cursor = new DBCommandCursor(db,
  3. db.runCommand(
  4. {aggregate: 1,
  5. pipeline: [{$listChangeStreams: 1},
  6. {$match: {$or: [{database: "bar", collection: "foo"},
  7. {database: "bar", collection: ""},
  8. {database: "", collection: ""}]}}
  9. ],
  10. cursor:{}}));

Example: Using Change Streams with Python

The following is an example of using an Amazon DocumentDB change stream with Python at the collection level.

  1. import os
  2. import sys
  3. from pymongo import MongoClient, ReadPreference
  4. username = "DocumentDBusername"
  5. password = <Insert your password>
  6. clusterendpoint = "DocumentDBClusterEndpoint”
  7. client = MongoClient(clusterendpoint, username=username, password=password, ssl='true', ssl_ca_certs='rds-combined-ca-bundle.pem')
  8. db = client['bar']
  9. #While ‘Primary’ is the default read preference, here we give an example of
  10. #how to specify the required read preference when reading the change streams
  11. coll = db.get_collection('foo', read_preference=ReadPreference.PRIMARY)
  12. #Create a stream object
  13. stream = coll.watch()
  14. #Write a new document to the collection to generate a change event
  15. coll.insert_one({'x': 1})
  16. #Read the next change event from the stream (if any)
  17. print(stream.try_next())
  18. """
  19. Expected Output:
  20. {'_id': {'_data': '015daf94f600000002010000000200009025'},
  21. 'clusterTime': Timestamp(1571788022, 2),
  22. 'documentKey': {'_id': ObjectId('5daf94f6ea258751778163d6')},
  23. 'fullDocument': {'_id': ObjectId('5daf94f6ea258751778163d6'), 'x': 1},
  24. 'ns': {'coll': 'foo', 'db': 'bar'},
  25. 'operationType': 'insert'}
  26. """
  27. #A subsequent attempt to read the next change event returns nothing, as there are no new changes
  28. print(stream.try_next())
  29. """
  30. Expected Output:
  31. None
  32. """
  33. #Generate a new change event by updating a document
  34. result = coll.update_one({'x': 1}, {'$set': {'x': 2}})
  35. print(stream.try_next())
  36. """
  37. Expected Output:
  38. {'_id': {'_data': '015daf99d400000001010000000100009025'},
  39. 'clusterTime': Timestamp(1571789268, 1),
  40. 'documentKey': {'_id': ObjectId('5daf9502ea258751778163d7')},
  41. 'ns': {'coll': 'foo', 'db': 'bar'},
  42. 'operationType': 'update',
  43. 'updateDescription': {'removedFields': [], 'updatedFields': {'x': 2}}}
  44. """

The following is an example of using an Amazon DocumentDB change stream with Python at the database level.

  1. import os
  2. import sys
  3. from pymongo import MongoClient
  4. username = "DocumentDBusername"
  5. password = <Insert your password>
  6. clusterendpoint = "DocumentDBClusterEndpoint”
  7. client = MongoClient(clusterendpoint, username=username, password=password, ssl='true', ssl_ca_certs='rds-combined-ca-bundle.pem')
  8. db = client['bar']
  9. #Create a stream object
  10. stream = db.watch()
  11. coll = db.get_collection('foo')
  12. #Write a new document to the collection foo to generate a change event
  13. coll.insert_one({'x': 1})
  14. #Read the next change event from the stream (if any)
  15. print(stream.try_next())
  16. """
  17. Expected Output:
  18. {'_id': {'_data': '015daf94f600000002010000000200009025'},
  19. 'clusterTime': Timestamp(1571788022, 2),
  20. 'documentKey': {'_id': ObjectId('5daf94f6ea258751778163d6')},
  21. 'fullDocument': {'_id': ObjectId('5daf94f6ea258751778163d6'), 'x': 1},
  22. 'ns': {'coll': 'foo', 'db': 'bar'},
  23. 'operationType': 'insert'}
  24. """
  25. #A subsequent attempt to read the next change event returns nothing, as there are no new changes
  26. print(stream.try_next())
  27. """
  28. Expected Output:
  29. None
  30. """
  31. coll = db.get_collection('foo1')
  32. #Write a new document to another collection to generate a change event
  33. coll.insert_one({'x': 1})
  34. print(stream.try_next())
  35. """
  36. Expected Output: Since the change stream cursor was the database level you can see change events from different collections in the same database
  37. {'_id': {'_data': '015daf94f600000002010000000200009025'},
  38. 'clusterTime': Timestamp(1571788022, 2),
  39. 'documentKey': {'_id': ObjectId('5daf94f6ea258751778163d6')},
  40. 'fullDocument': {'_id': ObjectId('5daf94f6ea258751778163d6'), 'x': 1},
  41. 'ns': {'coll': 'foo1', 'db': 'bar'},
  42. 'operationType': 'insert'}
  43. """

Full Document Lookup

The update change event does not include the full document; it includes only the change that was made. If your use case requires the complete document affected by an update, you can enable full document lookup when opening the stream.

The fullDocument document for an update change streams event represents the most current version of the updated document at the time of document lookup. If changes occurred between the update operation and the fullDocument lookup, the fullDocument document might not represent the document state at update time.

  1. #Create a stream object with update lookup enabled
  2. stream = coll.watch(full_document='updateLookup')
  3. #Generate a new change event by updating a document
  4. result = coll.update_one({'x': 2}, {'$set': {'x': 3}})
  5. stream.try_next()
  6. #Output:
  7. {'_id': {'_data': '015daf9b7c00000001010000000100009025'},
  8. 'clusterTime': Timestamp(1571789692, 1),
  9. 'documentKey': {'_id': ObjectId('5daf9502ea258751778163d7')},
  10. 'fullDocument': {'_id': ObjectId('5daf9502ea258751778163d7'), 'x': 3},
  11. 'ns': {'coll': 'foo', 'db': 'bar'},
  12. 'operationType': 'update',
  13. 'updateDescription': {'removedFields': [], 'updatedFields': {'x': 3}}}

Resuming a Change Stream

You can resume a change stream later by using a resume token, which is equal to the _id field of the last retrieved change event document.

  1. import os
  2. import sys
  3. from pymongo import MongoClient
  4. username = "DocumentDBusername"
  5. password = <Insert your password>
  6. clusterendpoint = "DocumentDBClusterEndpoint”
  7. client = MongoClient(clusterendpoint, username=username, password=password, ssl='true', ssl_ca_certs='rds-combined-ca-bundle.pem', retryWrites='false')
  8. db = client['bar']
  9. coll = db.get_collection('foo')
  10. #Create a stream object
  11. stream = db.watch()
  12. coll.update_one({'x': 1}, {'$set': {'x': 4}})
  13. event = stream.try_next()
  14. token = event['_id']
  15. print(token)
  16. """
  17. Output: This is the resume token that we will later us to resume the change stream
  18. {'_data': '015daf9c5b00000001010000000100009025'}
  19. """
  20. #Python provides a nice shortcut for getting a stream’s resume token
  21. print(stream.resume_token)
  22. """
  23. Output
  24. {'_data': '015daf9c5b00000001010000000100009025'}
  25. """
  26. #Generate a new change event by updating a document
  27. result = coll.update_one({'x': 4}, {'$set': {'x': 5}})
  28. #Generate another change event by inserting a document
  29. result = coll.insert_one({'y': 5})
  30. #Open a stream starting after the selected resume token
  31. stream = db.watch(full_document='updateLookup', resume_after=token)
  32. #Our first change event is the update with the specified _id
  33. print(stream.try_next())
  34. """
  35. #Output: Since we are resuming the change stream from the resume token, we will see all events after the first update operation. In our case, the change stream will resume from the update operation {x:5}
  36. {'_id': {'_data': '015f7e8f0c000000060100000006000fe038'},
  37. 'operationType': 'update',
  38. 'clusterTime': Timestamp(1602129676, 6),
  39. 'ns': {'db': 'bar', 'coll': 'foo'},
  40. 'documentKey': {'_id': ObjectId('5f7e8f0ac423bafbfd9adba2')},
  41. 'fullDocument': {'_id': ObjectId('5f7e8f0ac423bafbfd9adba2'), 'x': 5},
  42. 'updateDescription': {'updatedFields': {'x': 5}, 'removedFields': []}}
  43. """
  44. #Followed by the insert
  45. print(stream.try_next())
  46. """
  47. #Output:
  48. {'_id': {'_data': '015f7e8f0c000000070100000007000fe038'},
  49. 'operationType': 'insert',
  50. 'clusterTime': Timestamp(1602129676, 7),
  51. 'ns': {'db': 'bar', 'coll': 'foo'},
  52. 'documentKey': {'_id': ObjectId('5f7e8f0cbf8c233ed577eb94')},
  53. 'fullDocument': {'_id': ObjectId('5f7e8f0cbf8c233ed577eb94'), 'y': 5}}
  54. """

Resuming a Change Stream with startAtOperationTime

You can resume a change stream later from a particular time stamp by using startAtOperationTime.

Note

The ability to use startAtOperationTime is available in Amazon DocumentDB 4.0+. When using startAtOperationTime, the change stream cursor will only return changes that occurred at or after the specified Timestamp. The startAtOperationTime and resumeAfter commands are mutually exclusive and thus cannot be used together.

  1. import os
  2. import sys
  3. from pymongo import MongoClient
  4. username = "DocumentDBusername"
  5. password = <Insert your password>
  6. clusterendpoint = "DocumentDBClusterEndpoint”
  7. client = MongoClient(clusterendpoint, username=username, password=password, ssl='true', ssl_ca_certs='rds-root-ca-2020.pem',retryWrites='false')
  8. db = client['bar']
  9. coll = db.get_collection('foo')
  10. #Create a stream object
  11. stream = db.watch()
  12. coll.update_one({'x': 1}, {'$set': {'x': 4}})
  13. event = stream.try_next()
  14. timestamp = event['clusterTime']
  15. print(timestamp)
  16. """
  17. Output
  18. Timestamp(1602129114, 4)
  19. """
  20. #Generate a new change event by updating a document
  21. result = coll.update_one({'x': 4}, {'$set': {'x': 5}})
  22. result = coll.insert_one({'y': 5})
  23. #Generate another change event by inserting a document
  24. #Open a stream starting after specified time stamp
  25. stream = db.watch(start_at_operation_time=timestamp)
  26. print(stream.try_next())
  27. """
  28. #Output: Since we are resuming the change stream at the time stamp of our first update operation (x:4), the change stream cursor will point to that event
  29. {'_id': {'_data': '015f7e941a000000030100000003000fe038'},
  30. 'operationType': 'update',
  31. 'clusterTime': Timestamp(1602130970, 3),
  32. 'ns': {'db': 'bar', 'coll': 'foo'},
  33. 'documentKey': {'_id': ObjectId('5f7e9417c423bafbfd9adbb1')},
  34. 'updateDescription': {'updatedFields': {'x': 4}, 'removedFields': []}}
  35. """
  36. print(stream.try_next())
  37. """
  38. #Output: The second event will be the subsequent update operation (x:5)
  39. {'_id': {'_data': '015f7e9502000000050100000005000fe038'},
  40. 'operationType': 'update',
  41. 'clusterTime': Timestamp(1602131202, 5),
  42. 'ns': {'db': 'bar', 'coll': 'foo'},
  43. 'documentKey': {'_id': ObjectId('5f7e94ffc423bafbfd9adbb2')},
  44. 'updateDescription': {'updatedFields': {'x': 5}, 'removedFields': []}}
  45. """
  46. print(stream.try_next())
  47. """
  48. #Output: And finally the last event will be the insert operation (y:5)
  49. {'_id': {'_data': '015f7e9502000000060100000006000fe038'},
  50. 'operationType': 'insert',
  51. 'clusterTime': Timestamp(1602131202, 6),
  52. 'ns': {'db': 'bar', 'coll': 'foo'},
  53. 'documentKey': {'_id': ObjectId('5f7e95025c4a569e0f6dde92')},
  54. 'fullDocument': {'_id': ObjectId('5f7e95025c4a569e0f6dde92'), 'y': 5}}
  55. """

Transactions in change streams

Change stream events will not contain events from uncommitted and/or aborted transactions. For example, if you start a transaction with one INSERT operation and one UPDATE operation and. If your INSERT operation succeeds, but the UPDATE operation fails, the transaction will be rolled back. Since this transaction was rolled back, your change stream will not contain any events for this transaction.

Modifying the Change Stream Log Retention Duration

You can modify the change stream log retention duration to be between 1 hour and 7 days using the AWS Management Console or the AWS CLI.

To modify the change stream log retention duration

  1. Sign in to the AWS Management Console, and open the Amazon DocumentDB console at https://console.aws.amazon.com/docdb.

  2. In the navigation pane, choose Parameter groups .

    Tip

    If you don’t see the navigation pane on the left side of your screen, choose the menu icon (Using Change Streams - 图1) in the upper-left corner of the page.

  3. In the Parameter groups pane, choose the cluster parameter group that is associated with your cluster. To identify the cluster parameter group that is associated with your cluster, see Determining an Amazon DocumentDB Cluster’s Parameter Group.

  4. The resulting page shows the parameters and their corresponding details for your cluster parameter group. Select the parameter change_stream_log_retention_duration.

  5. On the top right of the page, choose Edit to change the value of the parameter. The change_stream_log_retention_duration parameter can be modified to be between 1 and 7 days.

  6. Make your change, and then choose Modify cluster parameter to save the changes. To discard your changes, choose Cancel.

To modify your cluster parameter group’s change_stream_log_retention_duration parameter, use the modify-db-cluster-parameter-group operation with the following parameters:

  • --db-cluster-parameter-group-name — Required. The name of the cluster parameter group that you are modifying. To identify the cluster parameter group that is associated with your cluster, see Determining an Amazon DocumentDB Cluster’s Parameter Group.

  • --parameters — Required. The parameter that you are modifying. Each parameter entry must include the following:

    • ParameterName — The name of the parameter that you are modifying. In this case, it is change_stream_log_retention_duration

    • ParameterValue — The new value for this parameter.

    • ApplyMethod — How you want changes to this parameter applied. Permitted values are immediate and pending-reboot.

      Note

      Parameters with the ApplyType of static must have an ApplyMethod of pending-reboot.

  1. To change the values of the parameter change_stream_log_retention_duration, run the following command and replace parameter-value with the value you want to modify the parameter to.

    For Linux, macOS, or Unix:

    1. aws docdb modify-db-cluster-parameter-group \
    2. --db-cluster-parameter-group-name sample-parameter-group \
    3. --parameters "ParameterName=change_stream_log_retention_duration,ParameterValue=<parameter-value>,ApplyMethod=immediate"

    For Windows:

    1. aws docdb modify-db-cluster-parameter-group ^
    2. --db-cluster-parameter-group-name sample-parameter-group ^
    3. --parameters "ParameterName=change_stream_log_retention_duration,ParameterValue=<parameter-value>,ApplyMethod=immediate"

    Output from this operation looks something like the following (JSON format).

    1. {
    2. "DBClusterParameterGroupName": "sample-parameter-group"
    3. }
  2. Wait at least 5 minutes.

  3. List the parameter values of sample-parameter-group to ensure that your changes have been made.

    For Linux, macOS, or Unix:

    1. aws docdb describe-db-cluster-parameters \
    2. --db-cluster-parameter-group-name sample-parameter-group

    For Windows:

    1. aws docdb describe-db-cluster-parameters ^
    2. --db-cluster-parameter-group-name sample-parameter-group

    Output from this operation looks something like the following (JSON format).

    1. {
    2. "Parameters": [
    3. {
    4. "ParameterName": "audit_logs",
    5. "ParameterValue": "disabled",
    6. "Description": "Enables auditing on cluster.",
    7. "Source": "system",
    8. "ApplyType": "dynamic",
    9. "DataType": "string",
    10. "AllowedValues": "enabled,disabled",
    11. "IsModifiable": true,
    12. "ApplyMethod": "pending-reboot"
    13. },
    14. {
    15. "ParameterName": "change_stream_log_retention_duration",
    16. "ParameterValue": "12345",
    17. "Description": "Duration of time in seconds that the change stream log is retained and can be consumed.",
    18. "Source": "user",
    19. "ApplyType": "dynamic",
    20. "DataType": "integer",
    21. "AllowedValues": "3600-86400",
    22. "IsModifiable": true,
    23. "ApplyMethod": "immediate"
    24. }
    25. ]
    26. }