Disaster recovery

Metadata damage and repair

If a file system has inconsistent or missing metadata, it is considereddamaged. You may find out about damage from a health message, or in someunfortunate cases from an assertion in a running MDS daemon.

Metadata damage can result either from data loss in the underlying RADOSlayer (e.g. multiple disk failures that lose all copies of a PG), or fromsoftware bugs.

CephFS includes some tools that may be able to recover a damaged file system,but to use them safely requires a solid understanding of CephFS internals.The documentation for these potentially dangerous operations is on aseparate page: Advanced: Metadata repair tools.

Data pool damage (files affected by lost data PGs)

If a PG is lost in a data pool, then the file system will continueto operate normally, but some parts of some files will simplybe missing (reads will return zeros).

Losing a data PG may affect many files. Files are split into many objects,so identifying which files are affected by loss of particular PGs requiresa full scan over all object IDs that may exist within the size of a file.This type of scan may be useful for identifying which files requirerestoring from a backup.

Danger

This command does not repair any metadata, so when restoring files inthis case you must remove the damaged file, and replace it in orderto have a fresh inode. Do not overwrite damaged files in place.

If you know that objects have been lost from PGs, use the pg_filessubcommand to scan for files that may have been damaged as a result:

  1. cephfs-data-scan pg_files <path> <pg id> [<pg id>...]

For example, if you have lost data from PGs 1.4 and 4.5, and you would liketo know which files under /home/bob might have been damaged:

  1. cephfs-data-scan pg_files /home/bob 1.4 4.5

The output will be a list of paths to potentially damaged files, oneper line.

Note that this command acts as a normal CephFS client to find all thefiles in the file system and read their layouts, so the MDS must beup and running.