Deploying Metadata Servers

Each CephFS file system requires at least one MDS. The cluster operator willgenerally use their automated deployment tool to launch required MDS servers asneeded. Rook and ansible (via the ceph-ansible playbooks) are recommendedtools for doing this. For clarity, we also show the systemd commands here whichmay be run by the deployment technology if executed on bare-metal.

See MDS Config Reference for details on configuring metadata servers.

Provisioning Hardware for an MDS

The present version of the MDS is single-threaded and CPU-bound for mostactivities, including responding to client requests. An MDS under the mostaggressive client loads uses about 2 to 3 CPU cores. This is due to the othermiscellaneous upkeep threads working in tandem.

Even so, it is recommended that an MDS server be well provisioned with anadvanced CPU with sufficient cores. Development is on-going to make better useof available CPU cores in the MDS; it is expected in future versions of Cephthat the MDS server will improve performance by taking advantage of more cores.

The other dimension to MDS performance is the available RAM for caching. TheMDS necessarily manages a distributed and cooperative metadata cache among allclients and other active MDSs. Therefore it is essential to provide the MDSwith sufficient RAM to enable faster metadata access and mutation. The defaultMDS cache size (see also Understanding MDS Cache Size Limits) is 4GB. It isrecommended to provision at least 8GB of RAM for the MDS to support this cachesize.

Generally, an MDS serving a large cluster of clients (1000 or more) will use atleast 64GB of cache. An MDS with a larger cache is not well explored in thelargest known community clusters; there may be diminishing returns wheremanagement of such a large cache negatively impacts performance in surprisingways. It would be best to do analysis with expected workloads to determine ifprovisioning more RAM is worthwhile.

In a bare-metal cluster, the best practice is to over-provision hardware forthe MDS server. Even if a single MDS daemon is unable to fully utilize thehardware, it may be desirable later on to start more active MDS daemons on thesame node to fully utilize the available cores and memory. Additionally, it maybecome clear with workloads on the cluster that performance improves withmultiple active MDS on the same node rather than over-provisioning a singleMDS.

Finally, be aware that CephFS is a highly-available file system by supportingstandby MDS (see also Terminology) for rapid failover. To get a realbenefit from deploying standbys, it is usually necessary to distribute MDSdaemons across at least two nodes in the cluster. Otherwise, a hardware failureon a single node may result in the file system becoming unavailable.

Co-locating the MDS with other Ceph daemons (hyperconverged) is an effectiveand recommended way to accomplish this so long as all daemons are configured touse available hardware within certain limits. For the MDS, this generallymeans limiting its cache size.

Adding an MDS

  • Create an mds data point /var/lib/ceph/mds/ceph-${id}. The daemon only uses this directory to store its keyring.

  • Create the authentication key, if you use CephX.

  1. $ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${id}/keyring
  • Start the service.
  1. $ sudo systemctl start ceph-mds@${id}
  • The status of the cluster should show:
  1. mds: ${id}:1 {0=${id}=up:active} 2 up:standby

Removing an MDS

If you have a metadata server in your cluster that you’d like to remove, you may usethe following method.

  • (Optionally:) Create a new replacement Metadata Server. If there are noreplacement MDS to take over once the MDS is removed, the file system willbecome unavailable to clients. If that is not desirable, consider adding ametadata server before tearing down the metadata server you would like totake offline.

  • Stop the MDS to be removed.

  1. $ sudo systemctl stop ceph-mds@${id}

The MDS will automatically notify the Ceph monitors that it is going down.This enables the monitors to perform instantaneous failover to an availablestandby, if one exists. It is unnecessary to use administrative commands toeffect this failover, e.g. through the use of ceph mds fail mds.${id}.

  • Remove the /var/lib/ceph/mds/ceph-${id} directory on the MDS.
  1. $ sudo rm -rf /var/lib/ceph/mds/ceph-${id}