Adding/Removing OSDs

When you have a cluster up and running, you may add OSDs or remove OSDsfrom the cluster at runtime.

Adding OSDs

When you want to expand a cluster, you may add an OSD at runtime. With Ceph, anOSD is generally one Ceph ceph-osd daemon for one storage drive within ahost machine. If your host has multiple storage drives, you may map oneceph-osd daemon for each drive.

Generally, it’s a good idea to check the capacity of your cluster to see if youare reaching the upper end of its capacity. As your cluster reaches its nearfull ratio, you should add one or more OSDs to expand your cluster’s capacity.

Warning

Do not let your cluster reach its full ratio beforeadding an OSD. OSD failures that occur after the cluster reachesits near full ratio may cause the cluster to exceed itsfull ratio.

Deploy your Hardware

If you are adding a new host when adding a new OSD, see HardwareRecommendations for details on minimum recommendations for OSD hardware. Toadd an OSD host to your cluster, first make sure you have an up-to-date versionof Linux installed, and you have made some initial preparations for yourstorage drives. See Filesystem Recommendations for details.

Add your OSD host to a rack in your cluster, connect it to the networkand ensure that it has network connectivity. See the Network ConfigurationReference for details.

Install the Required Software

For manually deployed clusters, you must install Ceph packagesmanually. See Installing Ceph (Manual) for details.You should configure SSH to a user with password-less authenticationand root permissions.

Adding an OSD (Manual)

This procedure sets up a ceph-osd daemon, configures it to use one drive,and configures the cluster to distribute data to the OSD. If your host hasmultiple drives, you may add an OSD for each drive by repeating this procedure.

To add an OSD, create a data directory for it, mount a drive to that directory,add the OSD to the cluster, and then add it to the CRUSH map.

When you add the OSD to the CRUSH map, consider the weight you give to the newOSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have largerhard drives than older hosts in the cluster (i.e., they may have greaterweight).

Tip

Ceph prefers uniform hardware across pools. If you are adding drivesof dissimilar size, you can adjust their weights. However, for bestperformance, consider a CRUSH hierarchy with drives of the same type/size.

  • Create the OSD. If no UUID is given, it will be set automatically when theOSD starts up. The following command will output the OSD number, which youwill need for subsequent steps.
  1. ceph osd create [{uuid} [{id}]]

If the optional parameter {id} is given it will be used as the OSD id.Note, in this case the command may fail if the number is already in use.

Warning

In general, explicitly specifying {id} is not recommended.IDs are allocated as an array, and skipping entries consumes some extramemory. This can become significant if there are large gaps and/orclusters are large. If {id} is not specified, the smallest available isused.

  • Create the default directory on your new OSD.
  1. ssh {new-osd-host}
  2. sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
  • If the OSD is for a drive other than the OS drive, prepare itfor use with Ceph, and mount it to the directory you just created:
  1. ssh {new-osd-host}
  2. sudo mkfs -t {fstype} /dev/{drive}
  3. sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
  • Initialize the OSD data directory.
  1. ssh {new-osd-host}
  2. ceph-osd -i {osd-num} --mkfs --mkkey

The directory must be empty before you can run ceph-osd.

  • Register the OSD authentication key. The value of ceph forceph-{osd-num} in the path is the $cluster-$id. If yourcluster name differs from ceph, use your cluster name instead.:
  1. ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
  • Add the OSD to the CRUSH map so that the OSD can begin receiving data. Theceph osd crush add command allows you to add OSDs to the CRUSH hierarchywherever you wish. If you specify at least one bucket, the commandwill place the OSD into the most specific bucket you specify, and it willmove that bucket underneath any other buckets you specify. Important: Ifyou specify only the root bucket, the command will attach the OSD directlyto the root, but CRUSH rules expect OSDs to be inside of hosts.

Execute the following:

  1. ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]

You may also decompile the CRUSH map, add the OSD to the device list, add thehost as a bucket (if it’s not already in the CRUSH map), add the device as anitem in the host, assign it a weight, recompile it and set it. SeeAdd/Move an OSD for details.

Replacing an OSD

When disks fail, or if an administrator wants to reprovision OSDs with a newbackend, for instance, for switching from FileStore to BlueStore, OSDs need tobe replaced. Unlike Removing the OSD, replaced OSD’s id and CRUSH map entryneed to be keep intact after the OSD is destroyed for replacement.

  • Make sure it is safe to destroy the OSD:
  1. while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done
  • Destroy the OSD first:
  1. ceph osd destroy {id} --yes-i-really-mean-it
  • Zap a disk for the new OSD, if the disk was used before for other purposes.It’s not necessary for a new disk:
  1. ceph-volume lvm zap /dev/sdX
  • Prepare the disk for replacement by using the previously destroyed OSD id:
  1. ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
  • And activate the OSD:
  1. ceph-volume lvm activate {id} {fsid}

Alternatively, instead of preparing and activating, the device can be recreatedin one call, like:

  1. ceph-volume lvm create --osd-id {id} --data /dev/sdX

Starting the OSD

After you add an OSD to Ceph, the OSD is in your configuration. However,it is not yet running. The OSD is down and in. You must startyour new OSD before it can begin receiving data. You may useservice ceph from your admin host or start the OSD from its hostmachine.

For Ubuntu Trusty use Upstart.

  1. sudo start ceph-osd id={osd-num}

For all other distros use systemd.

  1. sudo systemctl start ceph-osd@{osd-num}

Once you start your OSD, it is up and in.

Observe the Data Migration

Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancingthe server by migrating placement groups to your new OSD. You can observe thisprocess with the ceph tool.

  1. ceph -w

You should see the placement group states change from active+clean toactive, some degraded objects, and finally active+clean when migrationcompletes. (Control-c to exit.)

Removing OSDs (Manual)

When you want to reduce the size of a cluster or replace hardware, you mayremove an OSD at runtime. With Ceph, an OSD is generally one Ceph ceph-osddaemon for one storage drive within a host machine. If your host has multiplestorage drives, you may need to remove one ceph-osd daemon for each drive.Generally, it’s a good idea to check the capacity of your cluster to see if youare reaching the upper end of its capacity. Ensure that when you remove an OSDthat your cluster is not at its near full ratio.

Warning

Do not let your cluster reach its full ratio whenremoving an OSD. Removing OSDs could cause the cluster to reachor exceed its full ratio.

Take the OSD out of the Cluster

Before you remove an OSD, it is usually up and in. You need to take itout of the cluster so that Ceph can begin rebalancing and copying its data toother OSDs.

  1. ceph osd out {osd-num}

Observe the Data Migration

Once you have taken your OSD out of the cluster, Ceph will beginrebalancing the cluster by migrating placement groups out of the OSD youremoved. You can observe this process with the ceph tool.

  1. ceph -w

You should see the placement group states change from active+clean toactive, some degraded objects, and finally active+clean when migrationcompletes. (Control-c to exit.)

Note

Sometimes, typically in a “small” cluster with few hosts (forinstance with a small testing cluster), the fact to take out theOSD can spawn a CRUSH corner case where some PGs remain stuck in theactive+remapped state. If you are in this case, you should markthe OSD in with:

ceph osd in {osd-num}

to come back to the initial state and then, instead of marking outthe OSD, set its weight to 0 with:

ceph osd crush reweight osd.{osd-num} 0

After that, you can observe the data migration which should come to itsend. The difference between marking out the OSD and reweighting itto 0 is that in the first case the weight of the bucket which containsthe OSD is not changed whereas in the second case the weight of the bucketis updated (and decreased of the OSD weight). The reweight command couldbe sometimes favoured in the case of a “small” cluster.

Stopping the OSD

After you take an OSD out of the cluster, it may still be running.That is, the OSD may be up and out. You must stopyour OSD before you remove it from the configuration.

  1. ssh {osd-host}
  2. sudo systemctl stop ceph-osd@{osd-num}

Once you stop your OSD, it is down.

Removing the OSD

This procedure removes an OSD from a cluster map, removes its authenticationkey, removes the OSD from the OSD map, and removes the OSD from theceph.conf file. If your host has multiple drives, you may need to remove anOSD for each drive by repeating this procedure.

  • Let the cluster forget the OSD first. This step removes the OSD from the CRUSHmap, removes its authentication key. And it is removed from the OSD map aswell. Please note the purge subcommand is introduced in Luminous, for olderversions, please see below
  1. ceph osd purge {id} --yes-i-really-mean-it
  • Navigate to the host where you keep the master copy of the cluster’sceph.conf file.
  1. ssh {admin-host}
  2. cd /etc/ceph
  3. vim ceph.conf
  • Remove the OSD entry from your ceph.conf file (if it exists).
  1. [osd.1]
  2. host = {hostname}
  • From the host where you keep the master copy of the cluster’s ceph.conf file,copy the updated ceph.conf file to the /etc/ceph directory of otherhosts in your cluster.

If your Ceph cluster is older than Luminous, instead of using ceph osd purge,you need to perform this step manually:

  • Remove the OSD from the CRUSH map so that it no longer receives data. You mayalso decompile the CRUSH map, remove the OSD from the device list, remove thedevice as an item in the host bucket or remove the host bucket (if it’s in theCRUSH map and you intend to remove the host), recompile the map and set it.See Remove an OSD for details.
  1. ceph osd crush remove {name}
  • Remove the OSD authentication key.
  1. ceph auth del osd.{osd-num}

The value of ceph for ceph-{osd-num} in the path is the $cluster-$id.If your cluster name differs from ceph, use your cluster name instead.

  • Remove the OSD.
  1. ceph osd rm {osd-num}
  2. #for example
  3. ceph osd rm 1