Deploy Sharded Cluster using Hashed Sharding

Overview

Hashed shard keys use a hashed index of asingle field as the shard key to partition data across yoursharded cluster.

Hashed sharding provides a more even data distribution across the shardedcluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Withhashed sharding, documents with “close” shard key values are unlikelyto be on the same chunk or shard, and the mongos is morelikely to perform Broadcast Operations to fulfill a givenquery.

If you already have a sharded cluster deployed, skip toShard a Collection using Hashed Sharding.

Atlas, CloudManager and OpsManager

If you are currently using or are planning to use Atlas, Cloud Manageror Ops Manager, refer to their respective manual for instructions ondeploying a sharded cluster:

Considerations

Hostnames and Configuration

Tip

When possible, use a logical DNS hostname instead of an ip address,particularly when configuring replica set members or sharded clustermembers. The use of logical DNS hostnames avoids configurationchanges due to ip address changes.

Operating System

This tutorial uses the mongod and mongosprograms. Windows users should use the mongod.exe andmongos.exe programs instead.

IP Binding

Use the bind_ip option to ensure that MongoDB listens forconnections from applications on configured addresses.

Starting in MongoDB 3.6, MongoDB binaries, mongod andmongos, bind to localhost by default. If thenet.ipv6 configuration file setting or the —ipv6command line option is set for the binary, the binary additionally bindsto the localhost IPv6 address.

Previously, starting from MongoDB 2.6, only the binaries from theofficial MongoDB RPM (Red Hat, CentOS, Fedora Linux, and derivatives)and DEB (Debian, Ubuntu, and derivatives) packages bind to localhost bydefault.

When bound only to the localhost, these MongoDB 3.6 binaries can onlyaccept connections from clients (including the mongo shell,other members in your deployment for replica sets and sharded clusters)that are running on the same machine. Remote clients cannot connect tothe binaries bound only to localhost.

To override and bind to other ip addresses, you can use thenet.bindIp configuration file setting or the—bind_ip command-line option to specify a list of hostnames or ipaddresses.

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

For example, the following mongod instance binds to boththe localhost and the hostname My-Example-Associated-Hostname, which isassociated with the ip address 198.51.100.1:

  1. mongod --bind_ip localhost,My-Example-Associated-Hostname

In order to connect to this instance, remote clients must specifythe hostname or its associated ip address 198.51.100.1:

  1. mongo --host My-Example-Associated-Hostname
  2.  
  3. mongo --host 198.51.100.1

Security

This tutorial does not include the required steps for configuringInternal/Membership Authentication or Role-Based Access Control.See Deploy Sharded Cluster with Keyfile Authentication for atutorial on deploying a sharded cluster with akeyfile.

In production environments, sharded clusters should employ atminimum x.509 security for internal authenticationand client access:

Note

Enabling internal authentication also enablesRole-Based Access Control.

Deploy Sharded Cluster with Hashed Sharding

Tip

When possible, use a logical DNS hostname instead of an ip address,particularly when configuring replica set members or sharded clustermembers. The use of logical DNS hostnames avoids configurationchanges due to ip address changes.

Create the Config Server Replica Set

The following steps deploys a config server replica set.

For a production deployment, deploy a config server replica set with atleast three members. For testing purposes, you can create asingle-member replica set.

For this tutorial, the config server replica set members are associatedwith the following hosts:

Config Server Replica Set MemberHostname
Member 0cfg1.example.net
Member 1cfg2.example.net
Member 2cfg3.example.net

Start each member of the config server replica set.

When starting eachmongod, specify themongod settings either via a configuration file or thecommand line.

Configuration File

If using a configuration file, set:

  1. sharding:
  2. clusterRole: configsvr
  3. replication:
  4. replSetName: <replica set name>
  5. net:
  6. bindIp: localhost,<hostname(s)|ip address(es)>
  • sharding.clusterRole to configsvr,

  • replication.replSetName to the desired name of theconfig server replica set,

  • net.bindIp option to the hostname/ip address orcomma-delimited list of hostnames or ip addresses that remoteclients (including the other members of the config serverreplica set as well as other members of the sharded cluster)can use to connect to the instance.

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

Start the mongod with the —config optionset to the configuration file path.

  1. mongod --config <path-to-config-file>

Command Line

If using the command line options, start the mongodwith the —configsvr, —replSet, —bind_ip,and other options as appropriate to your deployment. For example:

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

  1. mongod --configsvr --replSet <replica set name> --dbpath <path> --bind_ip localhost,<hostname(s)|ip address(es)>

For more information on startup parameters, see themongod reference page.

Connect to one of the config servers.

Connect a mongo shell to one of the config servermembers.

  1. mongo --host <hostname> --port <port>

Initiate the replica set.

From the mongo shell, run the rs.initiate() method.

rs.initiate() can take an optional replica setconfiguration document. In thereplica set configuration document, include:

  • The _id set to the replica set name specified in eitherthe replication.replSetName or the —replSet option.
  • The configsvr field set to true for the config server replica set.
  • The members array with a document per each member of the replica set.

Important

Run rs.initiate() on just one and only onemongod instance for the replica set.

  1. rs.initiate(
  2. {
  3. _id: "<replSetName>",
  4. configsvr: true,
  5. members: [
  6. { _id : 0, host : "cfg1.example.net:27019" },
  7. { _id : 1, host : "cfg2.example.net:27019" },
  8. { _id : 2, host : "cfg3.example.net:27019" }
  9. ]
  10. }
  11. )

See Replica Set Configuration for more information onreplica set configuration documents.

Once the config server replica set (CSRS) is initiated and up, proceedto creating the shard replica sets.

Create the Shard Replica Sets

For a production deployment, use a replica set with at least threemembers for each shard. For testing purposes, you can create asingle-member replica set.

For each shard, use the following steps to create the shard replica set.

Start each member of the shard replica set.

When starting eachmongod, specify themongod settings either via a configuration file or thecommand line.

Configuration File

If using a configuration file, set:

  1. sharding:
  2. clusterRole: shardsvr
  3. replication:
  4. replSetName: <replSetName>
  5. net:
  6. bindIp: localhost,<ip address>
  • replication.replSetName to the desired name of thereplica set,

  • sharding.clusterRole option to shardsvr,

  • net.bindIp option to the ip or a comma-delimitedlist of ips that remote clients (including the other members ofthe config server replica set as well as other members of thesharded cluster) can use to connect to the instance.

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

Start the mongod with the —config option set tothe configuration file path.

  1. mongod --config <path-to-config-file>

Command Line

If using the command line option, start the mongod withthe —replSet, and —shardsvr, —bind_ip options,and other options as appropriate to your deployment. For example:

  1. mongod --shardsvr --replSet <replSetname> --dbpath <path> --bind_ip localhost,<hostname(s)|ip address(es)>

For more information on startup parameters, see themongod reference page.

Connect to one member of the shard replica set.

Connect a mongo shell to one of the replica set members.

  1. mongo --host <hostname> --port <port>

Initiate the replica set.

From the mongo shell, run the rs.initiate() method.

rs.initiate() can take an optional replica setconfiguration document. In thereplica set configuration document, include:

  • The _id field set to the replica set name specified ineither the replication.replSetName or the —replSetoption.
  • The members array with a document per each member of thereplica set.

The following example initiates a three member replica set.

Important

Run rs.initiate() on just one and only onemongod instance for the replica set.

  1. rs.initiate(
  2. {
  3. _id : <replicaSetName>,
  4. members: [
  5. { _id : 0, host : "s1-mongo1.example.net:27018" },
  6. { _id : 1, host : "s1-mongo2.example.net:27018" },
  7. { _id : 2, host : "s1-mongo3.example.net:27018" }
  8. ]
  9. }
  10. )

Connect a mongos to the Sharded Cluster

Connect a mongos to the cluster

Start a mongos using either a configuration file or acommand line parameter to specify the config servers.

Configuration File

If using a configuration file, set the sharding.configDB tothe config server replica set name and at least one member of the replicaset in <replSetName>/<host:port> format.

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

  1. sharding:
  2. configDB: <configReplSetName>/cfg1.example.net:27019,cfg2.example.net:27019
  3. net:
  4. bindIp: localhost,<hostname(s)|ip address(es)>

Start the mongos specifying the —config option and thepath to the configuration file.

  1. mongos --config <path-to-config>

For more information on the configuration file, seeconfiguration options.

Command Line

If using command line parameters start the mongos and specifythe —configdb, —bind_ip,and other options as appropriate to your deployment. For example:

Warning

Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.

  1. mongos --configdb <configReplSetName>/cfg1.example.net:27019,cfg2.example.net:27019 --bind_ip localhost,<hostname(s)|ip address(es)>

Include any other options as appropriate for your deployment.

Connect to the mongos.

Connect a mongo shell to the mongos.

  1. mongo --host <hostname> --port <port>

Once you have connected the mongo shell to themongos, continue to the next procedure to add shards tothe cluster.

Add Shards to the Cluster

In the mongo shell connected to the mongos, usethe sh.addShard() method to add each shard to the cluster.

The following operation adds a single shard replica set to the cluster:

  1. sh.addShard( "<replSetName>/s1-mongo1.example.net:27018")

Repeat to add all shards.

Enable Sharding for a Database

From the mongo shell connected to the mongos, usethe sh.enableSharding() method to enable sharding on thetarget database. Enabling sharding on a database makes it possible toshard collections within a database.

  1. sh.enableSharding("<database>")

Shard a Collection using Hashed Sharding

From the mongo shell connected to the mongos, usethe sh.shardCollection() method to shard a collection.

Note

You must have enabled sharding for the databasewhere the collection resides. SeeEnable Sharding for a Database.

If the collection already contains data, you must create aHashed Indexes on the shard key using thedb.collection.createIndex() method before usingshardCollection(). [1]

If the collection is empty, MongoDB creates the index as part ofsh.shardCollection().

The following operation shards the target collection using thehashed sharding strategy.

  1. sh.shardCollection("<database>.<collection>", { <shard key> : "hashed" } )
  • You must specify the full namespace of the collection and the shardkey.
  • Your selection of shard key affects the efficiency of sharding, aswell as your ability to take advantage of certain sharding featuressuch as zones. See the selectionconsiderations listed in the Hashed Sharding Shard Key.
[1]Starting in version 4.0, the mongo shell provides themethod convertShardKeyToHashed(). This method uses thesame hashing function as the hashed index and can be used to seewhat the hashed value would be for a key.