Tiered Hardware for Varying SLA or SLO

In sharded clusters, you can create zones of sharded data basedon the shard key. You can associate each zone with one or more shardsin the cluster. A shard can associate with any number of zones. In a balancedcluster, MongoDB migrates chunks covered by a zone only tothose shards associated with the zone.

Tip

Changed in version 4.0.3: By defining the zones and the zone ranges before sharding an emptyor a non-existing collection, the shard collection operation createschunks for the defined zone ranges as well as any additional chunksto cover the entire range of the shard key values and performs aninitial chunk distribution based on the zone ranges. This initialcreation and distribution of chunks allows for faster setup of zonedsharding. After the initial distribution, the balancer manages thechunk distribution going forward.

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.

This tutorial uses Zones to route documents based oncreation date either to shards zoned for supporting recent documents, orthose zoned for supporting archived documents.

The following are some example use cases for segmenting data based on ServiceLevel Agreement (SLA) or Service Level Objective (SLO):

  • An application requires providing low-latency access to recently inserted /updated documents
  • An application requires prioritizing low-latency access to a rangeor subset of documents
  • An application that benefits from ensuring specific ranges or subsets of dataare stored on servers with hardware that suits the SLA’s for accessingthat data

The following diagram illustrates a sharded cluster that uses hardware basedzones to satisfy data access SLAs or SLOs.

Diagram of sharded cluster architecture for tiered SLA

Scenario

A photo sharing application requires fast access to photos uploaded within thelast 6 months. The application stores the location of each photo along withits metadata in the photoshare database under the data collection.

The following documents represent photos uploaded by a single user:

  1. {
  2. "_id" : 10003010,
  3. "creation_date" : ISODate("2012-12-19T06:01:17.171Z"),
  4. "userid" : 123,
  5. "photo_location" : "example.net/storage/usr/photo_1.jpg"
  6. }
  7. {
  8. "_id" : 10003011,
  9. "creation_date" : ISODate("2013-12-19T06:01:17.171Z"),
  10. "userid" : 123,
  11. "photo_location" : "example.net/storage/usr/photo_2.jpg"
  12. }
  13. {
  14. "_id" : 10003012,
  15. "creation_date" : ISODate("2016-01-19T06:01:17.171Z"),
  16. "userid" : 123,
  17. "photo_location" : "example.net/storage/usr/photo_3.jpg"
  18. }

Note that only the document with _id : 10003012 was uploaded withinthe past year (as of June 2016).

Shard Key

The photo collection uses the { creation_date : 1 } index as the shard key.

The creation_date field in each document allows for creating zoneson the creation date.

Architecture

The sharded cluster deployment currently consists of three shards.

Diagram of sharded cluster architecture for tiered SLA

Zones

The application requires adding each shard to a zone based on itshardware tier. Each hardware tier represents a specific hardware configurationdesigned to satisfy a given SLA or SLO.

Diagram of sharded cluster architecture for tiered SLA

  • Fast Tier (“recent”)
  • These are the fastest performing machines, with largeamounts of RAM, fast SSD disks, and powerful CPUs.

The zone requires a range with:

  • a lower bound of { creation_date : ISODate(YYYY-mm-dd)},where the Year, Month, and Date specified by YYYY-mm-dd is within thelast 6 months.
  • an upper bound of { creation_date : MaxKey }.
    • Archival Tier (“archive”)
    • These machines use less RAM, slower disks, and more basic CPUs. However,they have a greater amount of storage per server.

The zone requires a range with:

  • a lower bound of { creation_date : MinKey }.
  • an upper bound of { creation_date : ISODate(YYYY-mm-dd)},where the Year, Month, and Date match the values used for the recenttier’s lower bound.

Note

The MinKey and MaxKey values are reserved specialvalues for comparisons.

As performance needs increase, adding additional shards and associating themto the appropriate zone based on their hardware tier allows for the cluster toscale horizontally.

When defining zone ranges based on time spans, weigh the benefits ofinfrequent updates to the zone ranges against the amount of data thatmust be migrated on an update. For example, setting a limit of 1 year fordata to be considered ‘recent’ likely covers more data than setting a limitof 1 month. While there are more migrations required when rotating on a1 month scale, the amount of documents that must be migrated is lower thanrotating on a 1 year scale.

Write Operations

With zones, if an inserted or updated document matches aconfigured zone, it can only be written to a shard inside that zone.

MongoDB can write documents that do not match a configured zone to anyshard in the cluster.

Note

The behavior described above requires the cluster to be in a steady statewith no chunks violating a configured zone. See the following sectionon the balancer for moreinformation.

Read Operations

MongoDB can route queries to a specific shard if the query includes theshard key.

For example, MongoDB can attempt a targeted read operation on the following query because it includescreation_date in the query document:

  1. photoDB = db.getSiblingDB("photoshare")
  2. photoDB.data.find( { "creation_date" : ISODate("2015-01-01") } )

If the requested document falls within the recent zone range, MongoDBwould route this query to the shards inside that zone, ensuring a faster readcompared to a cluster-wide broadcast read operation

Balancer

The balancermigrates chunks to the appropriate shard respecting anyconfigured zones. Until the migration, shards may contain chunks that violateconfigured zones. Once balancing completes, shards should onlycontain chunks whose ranges do not violate its assigned zones.

Adding or removing zones or zone ranges can result in chunk migrations.Depending on the size of your data set and the number of chunks a zone or zonerange affects, these migrations may impact cluster performance. Considerrunning your balancer during specific scheduledwindows. See Schedule the Balancing Window for a tutorial on howto set a scheduling window.

Security

For sharded clusters running with Role-Based Access Control, authenticate as a userwith at least the clusterManager role on the admin database.

Procedure

You must be connected to a mongos to create zones or zone ranges.You cannot create zone or zone ranges by connecting directly to ashard.

Disable the Balancer

The balancer must be disabled on the collectionto ensure no migrations take place while configuring the new zones.

Use sh.disableBalancing(), specifying the namespace of thecollection, to stop the balancer

  1. sh.disableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer processis currently running. Wait until any current balancing rounds have completedbefore proceeding.

Add each shard to the appropriate zone

Add shard0000 to the recent zone.

  1. sh.addShardTag("shard0000", "recent")

Add shard0001 to the recent zone.

  1. sh.addShardTag("shard0001", "recent")

Add shard0002 to the archive zone.

  1. sh.addShardTag("shard0002", "archive")

You can review the zone assigned to any given shard by runningsh.status().

Define ranges for each zone

Define range for recent photos and associate it to the recent zoneusing the sh.addTagRange() method. This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.addTagRange(
  2. "photoshare.data",
  3. { "creation_date" : ISODate("2016-01-01") },
  4. { "creation_date" : MaxKey },
  5. "recent"
  6. )

Define range for older photos and associate it to thearchive zone using the sh.addTagRange() method.This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.addTagRange(
  2. "photoshare.data",
  3. { "creation_date" : MinKey },
  4. { "creation_date" : ISODate("2016-01-01") },
  5. "archive"
  6. )

MinKey and MaxKey are reserved special values forcomparisons.

Enable the Balancer

Re-enable the balancer to rebalance the cluster.

Use sh.enableBalancing(), specifying the namespace of thecollection, to start the balancer

  1. sh.enableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer processis currently running.

Review the changes

The next time the balancer runs, itsplits andmigrates chunks across theshards respecting configured zones.

Once balancing finishes, the shards in the recent zone should onlycontain documents with creation_date greater than or equal toISODate("2016-01-01"), while shards in the archive zone shouldonly contain documents with creation_date less thanISODate("2016-01-01").

You can confirm the chunk distribution by running sh.status().

Updating Zone Ranges

To update the shard ranges, perform the following operations as a part ofa cron job or other scheduled procedure:

Disable the Balancer

The balancer must be disabled on the collectionto ensure no migrations take place while configuring the new zones.

Use sh.disableBalancing(), specifying the namespace of thecollection, to stop the balancer

  1. sh.disableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer processis currently running. Wait until any current balancing rounds have completedbefore proceeding.

Remove the old shard zone ranges

Remove the old recent zone range using thesh.removeTagRange() method. This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.removeTagRange(
  2. "photoshare.data",
  3. { "creation_date" : ISODate("2016-01-01") },
  4. { "creation_date" : MaxKey },
  5. "recent"
  6. )

Remove the old archive zone range using thesh.removeTagRange() method. This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.removeTagRange(
  2. "photoshare.data",
  3. { "creation_date" : MinKey },
  4. { "creation_date" : ISODate("2016-01-01") },
  5. "archive"
  6. )

MinKey and MaxKey are reserved special values forcomparisons.

Add the new zone range for each zone

Define range for recent photos and associate it to the recent zone usingthe sh.addTagRange() method. This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.addTagRange(
  2. "photoshare.data",
  3. { "creation_date" : ISODate("2016-06-01") },
  4. { "creation_date" : MaxKey },
  5. "recent"
  6. )

Define range for older photos and associate it to thearchive zone using the sh.addTagRange() method.This method requires:

  • the full namespace of the target collection.
  • the inclusive lower bound of the range.
  • the exclusive upper bound of the range.
  • the zone.
  1. sh.addTagRange(
  2. "photoshare.data",
  3. { "creation_date" : MinKey },
  4. { "creation_date" : ISODate("2016-06-01") },
  5. "archive"
  6. )

MinKey and MaxKey are reserved special values forcomparisons.

Enable the Balancer

Re-enable the balancer to rebalance the cluster.

Use sh.enableBalancing(), specifying the namespace of thecollection, to start the balancer

  1. sh.enableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer processis currently running.

Review the changes

The next time the balancer runs, itsplits chunks where necessary andmigrates chunks across theshards respecting the configured zones.

Before balancing, the shards in the recent zone only contained documentswith creation_date greater than or equal to ISODate("2016-01-01"),while shards in the archive zone only contained documents withcreation_date less than ISODate("2016-01-01").

Once balancing finishes, the shards in the recent zone should onlycontain documents with creation_date greater than or equal toISODate("2016-06-01"), while shards in the archive zone shouldonly contain documents with creation_date less thanISODate("2016-06-01").

You can confirm the chunk distribution by running sh.status().