CephFS health messages

Cluster health checks

The Ceph monitor daemons will generate health messages in responseto certain states of the file system map structure (and the enclosed MDS maps).

Message: mds rank(s) ranks have failedDescription: One or more MDS ranks are not currently assigned toan MDS daemon; the cluster will not recover until a suitable replacementdaemon starts.

Message: mds rank(s) ranks are damagedDescription: One or more MDS ranks has encountered severe damage toits stored metadata, and cannot start again until it is repaired.

Message: mds cluster is degradedDescription: One or more MDS ranks are not currently up and running, clientsmay pause metadata IO until this situation is resolved. This includesranks being failed or damaged, and additionally includes rankswhich are running on an MDS but have not yet made it to the active_state (e.g. ranks currently in _replay state).

Message: mds names are laggyDescription: The named MDS daemons have failed to send beacon messagesto the monitor for at least mds_beacon_grace (default 15s), whilethey are supposed to send beacon messages every mds_beacon_interval(default 4s). The daemons may have crashed. The Ceph monitor willautomatically replace laggy daemons with standbys if any are available.

Message: insufficient standby daemons availableDescription: One or more file systems are configured to have a certain numberof standby daemons available (including daemons in standby-replay) but thecluster does not have enough standby daemons. The standby daemons not in replaycount towards any file system (i.e. they may overlap). This warning canconfigured by setting ceph fs set <fs> standby_count_wanted <count>. Usezero for count to disable.

Daemon-reported health checks

MDS daemons can identify a variety of unwanted conditions, andindicate these to the operator in the output of ceph status.This conditions have human readable messages, and additionallya unique code starting MDS_HEALTH which appears in JSON output.

Message: “Behind on trimming…”Code: MDSHEALTH_TRIMDescription: CephFS maintains a metadata journal that is divided into_log segments. The length of journal (in number of segments) is controlledby the setting mds_log_max_segments, and when the number of segmentsexceeds that setting the MDS starts writing back metadata so that itcan remove (trim) the oldest segments. If this writeback is happeningtoo slowly, or a software bug is preventing trimming, then this healthmessage may appear. The threshold for this message to appear is for thenumber of segments to be double mds_log_max_segments.

Message: “Client name failing to respond to capability release”Code: MDSHEALTH_CLIENT_LATE_RELEASE, MDS_HEALTH_CLIENT_LATE_RELEASE_MANYDescription: CephFS clients are issued _capabilities by the MDS, whichare like locks. Sometimes, for example when another client needs access,the MDS will request clients release their capabilities. If the clientis unresponsive or buggy, it might fail to do so promptly or fail to doso at all. This message appears if a client has taken longer thansession_timeout (default 60s) to comply.

Message: “Client name failing to respond to cache pressure”Code: MDS_HEALTH_CLIENT_RECALL, MDS_HEALTH_CLIENT_RECALL_MANYDescription: Clients maintain a metadata cache. Items (such as inodes) in theclient cache are also pinned in the MDS cache, so when the MDS needs to shrinkits cache (to stay within mds_cache_memory_limit), it sends messages toclients to shrink their caches too. If the client is unresponsive or buggy,this can prevent the MDS from properly staying within its cache limits and itmay eventually run out of memory and crash. This message appears if a clienthas failed to release more thanmds_recall_warning_threshold capabilities (decaying with a half-life ofmds_recall_max_decay_rate) within the lastmds_recall_warning_decay_rate second.

Message: “Client name failing to advance its oldest client/flush tid”Code: MDSHEALTH_CLIENT_OLDEST_TID, MDS_HEALTH_CLIENT_OLDEST_TID_MANYDescription: The CephFS client-MDS protocol uses a field called the_oldest tid to inform the MDS of which client requests are fullycomplete and may therefore be forgotten about by the MDS. If a buggyclient is failing to advance this field, then the MDS may be preventedfrom properly cleaning up resources used by client requests. This messageappears if a client appears to have more than maxcompleted_requests(default 100000) requests that are complete on the MDS side but haven’tyet been accounted for in the client’s _oldest tid value.

Message: “Metadata damage detected”Code: MDS_HEALTH_DAMAGE,Description: Corrupt or missing metadata was encountered when readingfrom the metadata pool. This message indicates that the damage wassufficiently isolated for the MDS to continue operating, althoughclient accesses to the damaged subtree will return IO errors. Usethe damage ls admin socket command to get more detail on the damage.This message appears as soon as any damage is encountered.

Message: “MDS in read-only mode”Code: MDSHEALTH_READ_ONLY,Description: The MDS has gone into readonly mode and will return EROFSerror codes to client operations that attempt to modify any metadata. TheMDS will go into readonly mode if it encounters a write error whilewriting to the metadata pool, or if forced to by an administrator usingthe _force_readonly admin socket command.

Message: N slow requests are blocked”Code: MDS_HEALTH_SLOW_REQUEST,Description: One or more client requests have not been completed promptly,indicating that the MDS is either running very slowly, or that the RADOScluster is not acknowledging journal writes promptly, or that there is a bug.Use the ops admin socket command to list outstanding metadata operations.This message appears if any client requests have taken longer thanmds_op_complaint_time (default 30s).

Message: “Too many inodes in cache”Code: MDS_HEALTH_CACHE_OVERSIZEDDescription: The MDS is not succeeding in trimming its cache to comply with thelimit set by the administrator. If the MDS cache becomes too large, the daemonmay exhaust available memory and crash. By default, this message appears ifthe actual cache size (in memory) is at least 50% greater thanmds_cache_memory_limit (default 1GB). Modify mds_health_cache_thresholdto set the warning ratio.