Shard Keys

On this page

The shard key determines the distribution of the collection’sdocumentsamong the cluster’sshards. The shard key is either an indexedfieldor indexedcompoundfields that exists in every document in the collection.

MongoDBpartitionsdata in the collection using ranges of shard key values. Each range defines a non-overlapping range of shard key values and is associated with achunk.

MongoDB attempts to distribute chunks evenly among the shards in the cluster. The shard key has a direct relationship to the effectiveness of chunk distribution. SeeChoosing a Shard Key.

Shard Keys - 图1

IMPORTANT

Once you shard a collection, the shard key and the shard key values are immutable; i.e.

  • You cannot select a different shard key for that collection.
  • You cannot update the values of the shard key fields.

Shard Key Specification

To shard a collection, you must specify the target collection and the shard key to thesh.shardCollection()method:

  1. sh
  2. .
  3. shardCollection
  4. (
  5. namespace
  6. ,
  7. key
  8. )
  • The
    namespace
    parameter consists of a string
    <
    database
    >
    .
    <
    collection
    >
    specifying the full
    namespace
    of the target collection.
  • The
    key
    parameter consists of a document containing a field and the index traversal direction for that field.

For instructions specific to sharding a collection using thehashed shardingstrategy, seeShard a Collection using Hashed Sharding

For instructions specific to sharding a collection using theranged shardingstrategy, seeShard a Collection using Ranged Sharding.

Shard Key Indexes

All sharded collectionsmusthave an index that supports theshard key; i.e. the index can be an index on the shard key or acompound indexwhere the shard key is aprefixof the index.

  • If the collection is empty,
    sh.shardCollection()
    creates the index on the shard key if such an index does not already exists.
  • If the collection is not empty, you must create the index first before using
    sh.shardCollection()
    .

If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.

Unique Indexes

For a sharded collection, only the_idfield index and the index on the shard key or acompound indexwhere the shard key is aprefixcan beunique:

  • You cannot shard a collection that has unique indexes on other fields.
  • You cannot create unique indexes on other fields for a sharded collection.

Through the use of the unique index on the shard key, MongoDB_can_enforce uniqueness on the shard key values. MongoDB enforces uniqueness on the_entire_key combination, and not individual components of the shard key. To enforce uniqueness on the shard key values, pass theuniqueparameter astrueto thesh.shardCollection()method:

  • If the collection is empty,
    sh.shardCollection()
    creates the unique index on the shard key if such an index does not already exists.
  • If the collection is not empty, you must create the index first before using
    sh.shardCollection()
    .

Although you can have a uniquecompound indexwhere the shard key is aprefix, if usinguniqueparameter, the collection must have a unique index that is on the shard key.

You cannot specify a unique constraint on ahashed index.

Choosing a Shard Key

The choice of shard key affects how thesharded clusterbalancercreates and distributeschunksacross the availableshards. This affects the overall efficiency and performance of operations within the sharded cluster.

The shard key affects the performance and efficiency of thesharding strategyused by the sharded cluster.

The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster.

Shard Keys - 图2

At minimum, consider the consequences of thecardinality,frequency, and rate ofchangeof a potential shard key.

Restrictions

For restrictions on shard key, seeShard Key Limitations.

Collection Size

When sharding a collection that is not empty, the shard key can constrain the maximum supported collection size for the initial sharding operation only. SeeShardingExistingCollectionDataSize.

IMPORTANT

A sharded collection can grow to any size after successful sharding.

Shard Key Cardinality

Thecardinalityof a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.

A unique shard key value can exist on no more than a single chunk at any given time. If a shard key has a cardinality of4, then there can be no more than4chunks within the sharded cluster, each storing one unique shard key value. This constrains the number of effective shards in the cluster to4as well - adding additional shards would not provide any benefit.

The following image illustrates a sharded cluster using the fieldXas the shard key. IfXhas low cardinality, the distribution of inserts may look similar to the following:

Shard Keys - 图3

The cluster in this example would_not_scale horizontally, as incoming writes would only route to a subset of shards.

A shard key with high cardinality does not guarantee even distribution of data across the sharded cluster, though it does better facilitate horizontal scaling. Thefrequencyandrate of changeof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that has low cardinality, consider using acompound indexusing a field that has higher relative cardinality.

Shard Key Frequency

Consider a set representing the range of shard key values - thefrequencyof the shard key represents how often a given value occurs in the data. If the majority of documents contain only a subset of those values, then the chunks storing those documents become a bottleneck within the cluster. Furthermore, as those chunks grow, they may becomeindivisible chunksas they cannot be split any further. This reduces or removes the effectiveness of horizontal scaling within the cluster.

The following image illustrates a sharded cluster using the fieldXas the shard key. If a subset of values forXoccur with high frequency, the distribution of inserts may look similar to the following:

Shard Keys - 图4

A shard key with low frequency does not guarantee even distribution of data across the sharded cluster. Thecardinalityandrate of changeof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that has high frequency values, consider using acompound indexusing a unique or low frequency value.

Monotonically Changing Shard Keys

A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.

This occurs because every cluster has a chunk that captures a range with an upper bound ofmaxKey.maxKeyalways compares as higher than all other values. Similarly, there is a chunk that captures a range with a lower bound ofminKey.minKeyalways compares as lower than all other values.

If the shard key value is always increasing, all new inserts are routed to the chunk withmaxKeyas the upper bound. If the shard key value is always decreasing, all new inserts are routed to the chunk withminKeyas the lower bound. The shard containing that chunk becomes the bottleneck for write operations.

The following image illustrates a sharded cluster using the fieldXas the shard key. If the values forXare monotonically increasing, the distribution of inserts may look similar to the following:

Shard Keys - 图5

If the shard key value was monotonically decreasing, then all inserts would route toChunkAinstead.

A shard key that does not change monotonically does not guarantee even distribution of data across the sharded cluster. Thecardinalityandfrequencyof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that changes monotonically, consider usingHashed Sharding.