MDS States

The Metadata Server (MDS) goes through several states during normal operationin CephFS. For example, some states indicate that the MDS is recovering from afailover by a previous instance of the MDS. Here we’ll document all of thesestates and include a state diagram to visualize the transitions.

State Descriptions

Common states

  1. up:active

This is the normal operating state of the MDS. It indicates that the MDSand its rank in the file system is available.

  1. up:standby

The MDS is available to takeover for a failed rank (see also Terminology).The monitor will automatically assign an MDS in this state to a failed rankonce available.

  1. up:standby_replay

The MDS is following the journal of another up:active MDS. Should theactive MDS fail, having a standby MDS in replay mode is desirable as the MDS isreplaying the live journal and will more quickly takeover. A downside to havingstandby replay MDSs is that they are not available to takeover for any otherMDS that fails, only the MDS they follow.

Less common or transitory states

  1. up:boot

This state is broadcast to the Ceph monitors during startup. This state isnever visible as the Monitor immediately assign the MDS to an available rank orcommands the MDS to operate as a standby. The state is documented here forcompleteness.

  1. up:creating

The MDS is creating a new rank (perhaps rank 0) by constructing some per-rankmetadata (like the journal) and entering the MDS cluster.

  1. up:starting

The MDS is restarting a stopped rank. It opens associated per-rank metadataand enters the MDS cluster.

  1. up:stopping

When a rank is stopped, the monitors command an active MDS to enter theup:stopping state. In this state, the MDS accepts no new clientconnections, migrates all subtrees to other ranks in the file system, flush itsmetadata journal, and, if the last rank (0), evict all clients and shutdown(see also CephFS Administrative commands).

  1. up:replay

The MDS taking over a failed rank. This state represents that the MDS isrecovering its journal and other metadata.

  1. up:resolve

The MDS enters this state from up:replay if the Ceph file system hasmultiple ranks (including this one), i.e. it’s not a single active MDS cluster.The MDS is resolving any uncommitted inter-MDS operations. All ranks in thefile system must be in this state or later for progress to be made, i.e. norank can be failed/damaged or up:replay.

  1. up:reconnect

An MDS enters this state from up:replay or up:resolve. This state is tosolicit reconnections from clients. Any client which had a session with thisrank must reconnect during this time, configurable viamds_reconnect_timeout.

  1. up:rejoin

The MDS enters this state from up:reconnect. In this state, the MDS isrejoining the MDS cluster cache. In particular, all inter-MDS locks on metadataare reestablished.

If there are no known client requests to be replayed, the MDS directly becomesup:active from this state.

  1. up:clientreplay

The MDS may enter this state from up:rejoin. The MDS is replaying anyclient requests which were replied to but not yet durable (not journaled).Clients resend these requests during up:reconnect and the requests arereplayed once again. The MDS enters up:active after completing replay.

Failed states

  1. down:failed

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

  1. $ ceph fs dump
  2. ...
  3. max_mds 1
  4. in 0
  5. up {}
  6. failed 0
  7. ...

Rank 0 is part of the failed set.

  1. down:damaged

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

  1. $ ceph fs dump
  2. ...
  3. max_mds 1
  4. in 0
  5. up {}
  6. failed
  7. damaged 0
  8. ...

Rank 0 has become damaged (see also Disaster recovery) and placed inthe damaged set. An MDS which was running as rank 0 found metadata damagethat could not be automatically recovered. Operator intervention is required.

  1. down:stopped

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

  1. $ ceph fs dump
  2. ...
  3. max_mds 1
  4. in 0
  5. up {}
  6. failed
  7. damaged
  8. stopped 1
  9. ...

The rank has been stopped by reducing max_mds (see also Configuring multiple active MDS daemons).

State Diagram

This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows:

Color

  • Green: MDS is active.

  • Orange: MDS is in transient state trying to become active.

  • Red: MDS is indicating a state that causes the rank to be marked failed.

  • Purple: MDS and rank is stopping.

  • Black: MDS is indicating a state that causes the rank to be marked damaged.

Shape

  • Circle: an MDS holds this state.

  • Hexagon: no MDS holds this state (it is applied to the rank).

Lines

  • A double-lined shape indicates the rank is “in”.

../../_images/mds-state-diagram.svg