Hashed Sharding

Hashed Sharding

Hashed sharding uses a hashed index topartition data across your shared cluster. Hashed indexes compute thehash value of a single field as the index value; this value is used asyour shard key. [1]

Hashed sharding provides more even data distribution across the shardedcluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Post-hash,documents with “close” shard key values are unlikely to be on the samechunk or shard - the mongos is more likely to performBroadcast Operations to fulfill a given ranged query.mongos can target queries with equality matches to a single shard.

Tip

MongoDB automatically computes the hashes when resolving queries usinghashed indexes. Applications do not need to compute hashes.

Warning

MongoDB hashed indexes truncate floating point numbers to 64-bit integersbefore hashing. For example, a hashed index would store the samevalue for a field that held a value of 2.3, 2.2, and 2.9.To prevent collisions, do not use a hashed index for floatingpoint numbers that cannot be reliably converted to 64-bitintegers (and then back to floating point). MongoDB hashed indexes donot support floating point values larger than 2⁵³.

To see what the hashed value would be for a key, seeconvertShardKeyToHashed().

[1]	Starting in version 4.0, the `mongo` shell provides themethod `convertShardKeyToHashed()`. This method uses thesame hashing function as the hashed index and can be used to seewhat the hashed value would be for a key.

Hashed Sharding Shard Key

The field you choose as your hashed shard key should have a goodcardinality, or large number of different values.Hashed keys are ideal for shard keys with fields that changemonotonically like ObjectId values ortimestamps. A good example of this is the default _id field, assumingit only contains ObjectID values.

To shard a collection using a hashed shard key, seeDeploy Sharded Cluster using Hashed Sharding.

Hashed vs Ranged Sharding

Given a collection using a monotonically increasing value X as theshard key, using ranged sharding results in a distribution of incominginserts similar to the following:

Since the value of X is always increasing, the chunk with an upper boundof maxKey receives the majority incoming writes. Thisrestricts insert operations to the single shard containing this chunk, whichreduces or removes the advantage of distributed writes in a sharded cluster.

By using a hashed index on X, the distribution of inserts is similarto the following:

Since the data is now distributed more evenly, inserts are efficientlydistributed throughout the cluster.

Shard the Collection

Use the sh.shardCollection() method, specifying the full namespaceof the collection and the target hashed indexto use as the shard key.

sh.shardCollection( "database.collection", { <field> : "hashed" } )

Important

Once you shard a collection, the selection of the shard key isimmutable; i.e. you cannot select a different shard key for thatcollection.
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable _id field. For detailson updating the shard key, see Change a Document’s Shard Key Value.

Before MongoDB 4.2, a document’s shard key field value is immutable.

Shard a Populated Collection

If you shard a populated collection using a hashed shard key:

The sharding operation creates the initial chunk(s) to cover theentire range of the shard key values. The number of chunks createddepends on the configured chunk size.
After the initial chunk creation, the balancer migrates these initialchunks across the shards as appropriate as well as manages the chunkdistribution going forward.

Shard an Empty Collection

If you shard an empty collection using a hashed shard key:

With no zones and zone ranges specified forthe empty or non-existing collection:
- The sharding operation creates empty chunks to cover the entirerange of the shard key values and performs an initial chunkdistribution. By default, the operationcreates 2 chunks per shard and migrates across the cluster. You canuse numInitialChunks option to specify a different number ofinitial chunks. This initial creation and distribution ofchunks allows for faster setup of sharding.
- After the initial distribution, the balancer manages the chunkdistribution going forward.
With zones and zone ranges specified for theempty or a non-existing collection (Available starting in MongoDB4.0.3),
- The sharding operation creates empty chunks for the defined zoneranges as well as any additional chunks to cover the entire rangeof the shard key values and performs an initial chunk distributionbased on the zone ranges. This initial creation and distribution ofchunks allows for faster setup of zoned sharding.
- After the initial distribution, the balancer manages the chunkdistribution going forward.