Adding/Removing Monitors

When you have a cluster up and running, you may add or remove monitorsfrom the cluster at runtime. To bootstrap a monitor, see Manual Deploymentor Monitor Bootstrap.

Adding Monitors

Ceph monitors are light-weight processes that maintain a master copy of thecluster map. You can run a cluster with 1 monitor. We recommend at least 3monitors for a production cluster. Ceph monitors use a variation of thePaxos) protocol to establish consensus about maps and other criticalinformation across the cluster. Due to the nature of Paxos, Ceph requiresa majority of monitors running to establish a quorum (thus establishingconsensus).

It is advisable to run an odd-number of monitors but not mandatory. Anodd-number of monitors has a higher resiliency to failures than aneven-number of monitors. For instance, on a 2 monitor deployment, nofailures can be tolerated in order to maintain a quorum; with 3 monitors,one failure can be tolerated; in a 4 monitor deployment, one failure canbe tolerated; with 5 monitors, two failures can be tolerated. This iswhy an odd-number is advisable. Summarizing, Ceph needs a majority ofmonitors to be running (and able to communicate with each other), but thatmajority can be achieved using a single monitor, or 2 out of 2 monitors,2 out of 3, 3 out of 4, etc.

For an initial deployment of a multi-node Ceph cluster, it is advisable todeploy three monitors, increasing the number two at a time if a valid needfor more than three exists.

Since monitors are light-weight, it is possible to run them on the samehost as an OSD; however, we recommend running them on separate hosts,because fsync issues with the kernel may impair performance.

Note

A majority of monitors in your cluster must be able toreach each other in order to establish a quorum.

Deploy your Hardware

If you are adding a new host when adding a new monitor, see HardwareRecommendations for details on minimum recommendations for monitor hardware.To add a monitor host to your cluster, first make sure you have an up-to-dateversion of Linux installed (typically Ubuntu 16.04 or RHEL 7).

Add your monitor host to a rack in your cluster, connect it to the networkand ensure that it has network connectivity.

Install the Required Software

For manually deployed clusters, you must install Ceph packagesmanually. See Installing Packages for details.You should configure SSH to a user with password-less authenticationand root permissions.

Adding a Monitor (Manual)

This procedure creates a ceph-mon data directory, retrieves the monitor mapand monitor keyring, and adds a ceph-mon daemon to your cluster. Ifthis results in only two monitor daemons, you may add more monitors byrepeating this procedure until you have a sufficient number of ceph-mondaemons to achieve a quorum.

At this point you should define your monitor’s id. Traditionally, monitorshave been named with single letters (a, b, c, …), but you arefree to define the id as you see fit. For the purpose of this document,please take into account that {mon-id} should be the id you chose,without the mon. prefix (i.e., {mon-id} should be the aon mon.a).

  • Create the default directory on the machine that will host yournew monitor.
  1. ssh {new-mon-host}
  2. sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
  • Create a temporary directory {tmp} to keep the files needed duringthis process. This directory should be different from the monitor’s defaultdirectory created in the previous step, and can be removed after all thesteps are executed.
  1. mkdir {tmp}
  • Retrieve the keyring for your monitors, where {tmp} is the path tothe retrieved keyring, and {key-filename} is the name of the filecontaining the retrieved monitor key.
  1. ceph auth get mon. -o {tmp}/{key-filename}
  • Retrieve the monitor map, where {tmp} is the path tothe retrieved monitor map, and {map-filename} is the name of the filecontaining the retrieved monitor map.
  1. ceph mon getmap -o {tmp}/{map-filename}
  • Prepare the monitor’s data directory created in the first step. You mustspecify the path to the monitor map so that you can retrieve theinformation about a quorum of monitors and their fsid. You must alsospecify a path to the monitor keyring:
  1. sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
  • Start the new monitor and it will automatically join the cluster.The daemon needs to know which address to bind to, via either the—public-addr {ip} or —public-network {network} argument.For example:
  1. ceph-mon -i {mon-id} --public-addr {ip:port}

Removing Monitors

When you remove monitors from a cluster, consider that Ceph monitors usePAXOS to establish consensus about the master cluster map. You must havea sufficient number of monitors to establish a quorum for consensus aboutthe cluster map.

Removing a Monitor (Manual)

This procedure removes a ceph-mon daemon from your cluster. If thisprocedure results in only two monitor daemons, you may add or remove anothermonitor until you have a number of ceph-mon daemons that can achieve aquorum.

  • Stop the monitor.
  1. service ceph -a stop mon.{mon-id}
  • Remove the monitor from the cluster.
  1. ceph mon remove {mon-id}
  • Remove the monitor entry from ceph.conf.

Removing Monitors from an Unhealthy Cluster

This procedure removes a ceph-mon daemon from an unhealthycluster, for example a cluster where the monitors cannot form aquorum.

  • Stop all ceph-mon daemons on all monitor hosts.
  1. ssh {mon-host}
  2. service ceph stop mon || stop ceph-mon-all
  3. # and repeat for all mons
  • Identify a surviving monitor and log in to that host.
  1. ssh {mon-host}
  • Extract a copy of the monmap file.
  1. ceph-mon -i {mon-id} --extract-monmap {map-path}
  2. # in most cases, that's
  3. ceph-mon -i `hostname` --extract-monmap /tmp/monmap
  • Remove the non-surviving or problematic monitors. For example, ifyou have three monitors, mon.a, mon.b, and mon.c, whereonly mon.a will survive, follow the example below:
  1. monmaptool {map-path} --rm {mon-id}
  2. # for example,
  3. monmaptool /tmp/monmap --rm b
  4. monmaptool /tmp/monmap --rm c
  • Inject the surviving map with the removed monitors into thesurviving monitor(s). For example, to inject a map into monitormon.a, follow the example below:
  1. ceph-mon -i {mon-id} --inject-monmap {map-path}
  2. # for example,
  3. ceph-mon -i a --inject-monmap /tmp/monmap
  • Start only the surviving monitors.

  • Verify the monitors form a quorum (ceph -s).

  • You may wish to archive the removed monitors’ data directory in/var/lib/ceph/mon in a safe location, or delete it if you areconfident the remaining monitors are healthy and are sufficientlyredundant.

Changing a Monitor’s IP Address

Important

Existing monitors are not supposed to change their IP addresses.

Monitors are critical components of a Ceph cluster, and they need to maintain aquorum for the whole system to work properly. To establish a quorum, themonitors need to discover each other. Ceph has strict requirements fordiscovering monitors.

Ceph clients and other Ceph daemons use ceph.conf to discover monitors.However, monitors discover each other using the monitor map, not ceph.conf.For example, if you refer to Adding a Monitor (Manual) you will see that youneed to obtain the current monmap for the cluster when creating a new monitor,as it is one of the required arguments of ceph-mon -i {mon-id} —mkfs. Thefollowing sections explain the consistency requirements for Ceph monitors, and afew safe ways to change a monitor’s IP address.

Consistency Requirements

A monitor always refers to the local copy of the monmap when discovering othermonitors in the cluster. Using the monmap instead of ceph.conf avoidserrors that could break the cluster (e.g., typos in ceph.conf whenspecifying a monitor address or port). Since monitors use monmaps for discoveryand they share monmaps with clients and other Ceph daemons, the monmap providesmonitors with a strict guarantee that their consensus is valid.

Strict consistency also applies to updates to the monmap. As with any otherupdates on the monitor, changes to the monmap always run through a distributedconsensus algorithm called Paxos). The monitors must agree on each update tothe monmap, such as adding or removing a monitor, to ensure that each monitor inthe quorum has the same version of the monmap. Updates to the monmap areincremental so that monitors have the latest agreed upon version, and a set ofprevious versions, allowing a monitor that has an older version of the monmap tocatch up with the current state of the cluster.

If monitors discovered each other through the Ceph configuration file instead ofthrough the monmap, it would introduce additional risks because the Cephconfiguration files are not updated and distributed automatically. Monitorsmight inadvertently use an older ceph.conf file, fail to recognize amonitor, fall out of a quorum, or develop a situation where Paxos) is not ableto determine the current state of the system accurately. Consequently, makingchanges to an existing monitor’s IP address must be done with great care.

Changing a Monitor’s IP address (The Right Way)

Changing a monitor’s IP address in ceph.conf only is not sufficient toensure that other monitors in the cluster will receive the update. To change amonitor’s IP address, you must add a new monitor with the IP address you wantto use (as described in Adding a Monitor (Manual)), ensure that the newmonitor successfully joins the quorum; then, remove the monitor that uses theold IP address. Then, update the ceph.conf file to ensure that clients andother daemons know the IP address of the new monitor.

For example, lets assume there are three monitors in place, such as

  1. [mon.a]
  2. host = host01
  3. addr = 10.0.0.1:6789
  4. [mon.b]
  5. host = host02
  6. addr = 10.0.0.2:6789
  7. [mon.c]
  8. host = host03
  9. addr = 10.0.0.3:6789

To change mon.c to host04 with the IP address 10.0.0.4, follow thesteps in Adding a Monitor (Manual) by adding a new monitor mon.d. Ensurethat mon.d is running before removing mon.c, or it will break thequorum. Remove mon.c as described on Removing a Monitor (Manual). Movingall three monitors would thus require repeating this process as many times asneeded.

Changing a Monitor’s IP address (The Messy Way)

There may come a time when the monitors must be moved to a different network, adifferent part of the datacenter or a different datacenter altogether. While itis possible to do it, the process becomes a bit more hazardous.

In such a case, the solution is to generate a new monmap with updated IPaddresses for all the monitors in the cluster, and inject the new map on eachindividual monitor. This is not the most user-friendly approach, but we do notexpect this to be something that needs to be done every other week. As it isclearly stated on the top of this section, monitors are not supposed to changeIP addresses.

Using the previous monitor configuration as an example, assume you want to moveall the monitors from the 10.0.0.x range to 10.1.0.x, and thesenetworks are unable to communicate. Use the following procedure:

  • Retrieve the monitor map, where {tmp} is the path tothe retrieved monitor map, and {filename} is the name of the filecontaining the retrieved monitor map.
  1. ceph mon getmap -o {tmp}/{filename}
  • The following example demonstrates the contents of the monmap.
  1. $ monmaptool --print {tmp}/{filename}
  2.  
  3. monmaptool: monmap file {tmp}/{filename}
  4. epoch 1
  5. fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
  6. last_changed 2012-12-17 02:46:41.591248
  7. created 2012-12-17 02:46:41.591248
  8. 0: 10.0.0.1:6789/0 mon.a
  9. 1: 10.0.0.2:6789/0 mon.b
  10. 2: 10.0.0.3:6789/0 mon.c
  • Remove the existing monitors.
  1. $ monmaptool --rm a --rm b --rm c {tmp}/{filename}
  2.  
  3. monmaptool: monmap file {tmp}/{filename}
  4. monmaptool: removing a
  5. monmaptool: removing b
  6. monmaptool: removing c
  7. monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
  • Add the new monitor locations.
  1. $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
  2.  
  3. monmaptool: monmap file {tmp}/{filename}
  4. monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
  • Check new contents.
  1. $ monmaptool --print {tmp}/{filename}
  2.  
  3. monmaptool: monmap file {tmp}/{filename}
  4. epoch 1
  5. fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
  6. last_changed 2012-12-17 02:46:41.591248
  7. created 2012-12-17 02:46:41.591248
  8. 0: 10.1.0.1:6789/0 mon.a
  9. 1: 10.1.0.2:6789/0 mon.b
  10. 2: 10.1.0.3:6789/0 mon.c

At this point, we assume the monitors (and stores) are installed at the newlocation. The next step is to propagate the modified monmap to the newmonitors, and inject the modified monmap into each new monitor.

  • First, make sure to stop all your monitors. Injection must be done whilethe daemon is not running.

  • Inject the monmap.

  1. ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
  • Restart the monitors.

After this step, migration to the new location is complete andthe monitors should operate successfully.