Segmenting Data by Application or Customer

In sharded clusters, you can create zones of sharded data basedon the shard key. You can associate each zone with one or more shardsin the cluster. A shard can associate with any number of zones. In a balancedcluster, MongoDB migrates chunks covered by a zone only tothose shards associated with the zone.

Tip

Changed in version 4.0.3: By defining the zones and the zone ranges before sharding an emptyor a non-existing collection, the shard collection operation createschunks for the defined zone ranges as well as any additional chunksto cover the entire range of the shard key values and performs aninitial chunk distribution based on the zone ranges. This initialcreation and distribution of chunks allows for faster setup of zonedsharding. After the initial distribution, the balancer manages thechunk distribution going forward.

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.

This tutorial shows you how to segment data using Zones.

Consider the following scenarios where segmenting data by application orcustomer may be necessary:

  • A database serving multiple applications
  • A database serving multiple customers
  • A database that requires isolating ranges or subsets of applicationor customer data
  • A database that requires resource allocation for ranges or subsets ofapplication or customer data

This diagram illustrates a sharded cluster using zones tosegment data based on application or customer. This allows for data tobe isolated to specific shards. Additionally, each shard can have specifichardware allocated to fit the performance requirement of the data stored onthat shard.

Overview of zones used for supporting data segmentation

Scenario

An application tracks the score of a user along with a client field,storing scores in the gamify database under the users collection.Each possible value of client requires its own zone to allow fordata segmentation. It also allows the administrator to optimize thehardware for each shard associated to a client for performance and cost.

The following documents represent a partial view of two users:

  1. {
  2. "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
  3. "client" : "robot",
  4. "userid" : 123,
  5. "high_score" : 181,
  6. ...,
  7. }
  8. {
  9. "_id" : ObjectId("56f08c447fe58b2e96f595fb"),
  10. "client" : "fruitos",
  11. "userid" : 456,
  12. "high_score" : 210,
  13. ...,
  14. }

Shard Key

The users collection uses the { client : 1, userid : 1 } compoundindex as the shard key.

The client field in each document allows creating a zone for eachdistinct client value.

The userid field provides a high cardinality and low frequencycomponent to the shard key relative to country.

See Choosing a Shard Key for moregeneral instructions on selecting a shard key.

Architecture

The application requires adding shard to a zone associated to a specificclient.

The sharded cluster deployment currently consists of four shards.

Diagram of Data Segmentation Architecture using zones

Zones

For this application, there are two client zones.

Diagram of zones used for supporting data segmentation

  • Robot client (“robot”)
  • This zone represents all documents where client : robot.
  • FruitOS client (“fruitos”)
  • This zone represents all documents where client : fruitos.

Write Operations

With zones, if an inserted or updated document matches aconfigured zone, it can only be written to a shard inside that zone.

MongoDB can write documents that do not match a configured zone to anyshard in the cluster.

Note

The behavior described above requires the cluster to be in a steady statewith no chunks violating a configured zone. See the following sectionon the balancer for moreinformation.

Read Operations

MongoDB can route queries to a specific shard if the query includes at leastthe client field.

For example, MongoDB can attempt a targeted read operation on the following query:

  1. chatDB = db.getSiblingDB("gamify")
  2. chatDB.users.find( { "client" : "robot" , "userid" : "123" } )

Queries without the client field perform broadcast operations.

Balancer

The balancermigrates chunks to the appropriate shard respecting anyconfigured zones. Until the migration, shards may contain chunks that violateconfigured zones. Once balancing completes, shards should onlycontain chunks whose ranges do not violate its assigned zones.

Adding or removing zones or zone ranges can result in chunk migrations.Depending on the size of your data set and the number of chunks a zone or zonerange affects, these migrations may impact cluster performance. Considerrunning your balancer during specific scheduledwindows. See Schedule the Balancing Window for a tutorial on howto set a scheduling window.

Security

For sharded clusters running with Role-Based Access Control, authenticate as a userwith at least the clusterManager role on the admin database.

Procedure

You must be connected to a mongos associated to the targetsharded cluster to proceed. You cannot create zones or zone ranges byconnecting directly to a shard.

Disable the Balancer

The balancer must be disabled on the collectionto ensure no migrations take place while configuring the new zones.

Use sh.disableBalancing(), specifying the namespace of thecollection, to stop the balancer.

  1. sh.disableBalancing("chat.message")

Use sh.isBalancerRunning() to check if the balancer processis currently running. Wait until any current balancing rounds have completedbefore proceeding.

Add each shard to the appropriate zone

Add shard0000 to the robot zone.

  1. sh.addShardTag("shard0000", "robot")

Add shard0001 to the robot zone.

  1. sh.addShardTag("shard0001", "robot")

Add shard0002 to the fruitos zone.

  1. sh.addShardTag("shard0002", "fruitos")

Add shard0003 to the fruitos zone.

  1. sh.addShardTag("shard0003", "fruitos")

Run sh.status() to review the zone configured for the shardedcluster.

Define ranges for each zone

Define range for the robot client and associate it to the robotzone using the sh.addTagRange() method.

This method requires:

  • The full namespace of the target collection
  • The inclusive lower bound of the range
  • The exclusive upper bound of the range
  • The name of the zone
  1. sh.addTagRange(
  2. "gamify.users",
  3. { "client" : "robot", "userid" : MinKey },
  4. { "client" : "robot", "userid" : MaxKey },
  5. "robot"
  6. )

Define range for the fruitos client and associate it to thefruitos zone using the sh.addTagRange() method.

This method requires:

  • The full namespace of the target collection
  • The inclusive lower bound of the range
  • The exclusive upper bound of the range
  • The name of the zone
  1. sh.addTagRange(
  2. "gamify.users",
  3. { "client" : "fruitos", "userid" : MinKey },
  4. { "client" : "fruitos", "userid" : MaxKey },
  5. "fruitos"
  6. )

The MinKey and MaxKey values are reserved specialvalues for comparisons. MinKey always compares as lower thanevery other possible value, while MaxKey always compares ashigher than every other possible value. The configured ranges captures everyuser for each client.

Enable the Balancer

Re-enable the balancer to rebalance the cluster.

Use sh.enableBalancing(), specifying the namespace of thecollection, to start the balancer.

  1. sh.enableBalancing("chat.message")

Use sh.isBalancerRunning() to check if the balancer processis currently running.

Review the changes

The next time the balancer runs, itsplits andmigrates chunks across theshards respecting the configured zones.

Once balancing finishes, the shards in the robot zone only containdocuments with client : robot, while shards in the fruitos zone onlycontain documents with client : fruitos.

You can confirm the chunk distribution by running sh.status().