Control Commands

Monitor Commands

Monitor commands are issued using the ceph utility:

  1. ceph [-m monhost] {command}

The command is usually (though not always) of the form:

  1. ceph {subsystem} {command}

System Commands

Execute the following to display the current status of the cluster.

  1. ceph -s
  2. ceph status

Execute the following to display a running summary of the status of the cluster,and major events.

  1. ceph -w

Execute the following to show the monitor quorum, including which monitors areparticipating and which one is the leader.

  1. ceph quorum_status

Execute the following to query the status of a single monitor, including whetheror not it is in the quorum.

  1. ceph [-m monhost] mon_status

Authentication Subsystem

To add a keyring for an OSD, execute the following:

  1. ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}

To list the cluster’s keys and their capabilities, execute the following:

  1. ceph auth ls

Placement Group Subsystem

To display the statistics for all placement groups, execute the following:

  1. ceph pg dump [--format {format}]

The valid formats are plain (default) and json.

To display the statistics for all placement groups stuck in a specified state,execute the following:

  1. ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]

—format may be plain (default) or json

—threshold defines how many seconds “stuck” is (default: 300)

Inactive Placement groups cannot process reads or writes because they are waiting for an OSDwith the most up-to-date data to come back.

Unclean Placement groups contain objects that are not replicated the desired numberof times. They should be recovering.

Stale Placement groups are in an unknown state - the OSDs that host them have notreported to the monitor cluster in a while (configured bymon_osd_report_timeout).

Delete “lost” objects or revert them to their prior state, either a previous versionor delete them if they were just created.

  1. ceph pg {pgid} mark_unfound_lost revert|delete

OSD Subsystem

Query OSD subsystem status.

  1. ceph osd stat

Write a copy of the most recent OSD map to a file. Seeosdmaptool.

  1. ceph osd getmap -o file

Write a copy of the crush map from the most recent OSD map tofile.

  1. ceph osd getcrushmap -o file

The foregoing is functionally equivalent to

  1. ceph osd getmap -o /tmp/osdmap
  2. osdmaptool /tmp/osdmap --export-crush file

Dump the OSD map. Valid formats for -f are plain and json. If no—format option is given, the OSD map is dumped as plain text.

  1. ceph osd dump [--format {format}]

Dump the OSD map as a tree with one line per OSD containing weightand state.

  1. ceph osd tree [--format {format}]

Find out where a specific object is or would be stored in the system:

  1. ceph osd map <pool-name> <object-name>

Add or move a new item (OSD) with the given id/name/weight at the specifiedlocation.

  1. ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]

Remove an existing item (OSD) from the CRUSH map.

  1. ceph osd crush remove {name}

Remove an existing bucket from the CRUSH map.

  1. ceph osd crush remove {bucket-name}

Move an existing bucket from one position in the hierarchy to another.

  1. ceph osd crush move {id} {loc1} [{loc2} ...]

Set the weight of the item given by {name} to {weight}.

  1. ceph osd crush reweight {name} {weight}

Mark an OSD as lost. This may result in permanent data loss. Use with caution.

  1. ceph osd lost {id} [--yes-i-really-mean-it]

Create a new OSD. If no UUID is given, it will be set automatically when the OSDstarts up.

  1. ceph osd create [{uuid}]

Remove the given OSD(s).

  1. ceph osd rm [{id}...]

Query the current max_osd parameter in the OSD map.

  1. ceph osd getmaxosd

Import the given crush map.

  1. ceph osd setcrushmap -i file

Set the max_osd parameter in the OSD map. This is necessary whenexpanding the storage cluster.

  1. ceph osd setmaxosd

Mark OSD {osd-num} down.

  1. ceph osd down {osd-num}

Mark OSD {osd-num} out of the distribution (i.e. allocated no data).

  1. ceph osd out {osd-num}

Mark {osd-num} in the distribution (i.e. allocated data).

  1. ceph osd in {osd-num}

Set or clear the pause flags in the OSD map. If set, no IO requestswill be sent to any OSD. Clearing the flags via unpause results inresending pending requests.

  1. ceph osd pause
  2. ceph osd unpause

Set the override weight (reweight) of {osd-num} to {weight}. Two OSDs with thesame weight will receive roughly the same number of I/O requests andstore approximately the same amount of data. ceph osd reweightsets an override weight on the OSD. This value is in the range 0 to 1,and forces CRUSH to re-place (1-weight) of the data that wouldotherwise live on this drive. It does not change weights assignedto the buckets above the OSD in the crush map, and is a correctivemeasure in case the normal CRUSH distribution is not working out quiteright. For instance, if one of your OSDs is at 90% and the others areat 50%, you could reduce this weight to compensate.

  1. ceph osd reweight {osd-num} {weight}

Balance OSD fullness by reducing the override weight of OSDs which areoverly utilized. Note that these override aka reweight valuesdefault to 1.00000 and are relative only to each other; they not absolute.It is crucial to distinguish them from CRUSH weights, which reflect theabsolute capacity of a bucket in TiB. By default this command adjustsoverride weight on OSDs which have + or - 20% of the average utilization,but if you include a threshold that percentage will be used instead.

  1. ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]

To limit the step by which any OSD’s reweight will be changed, specifymax_change which defaults to 0.05. To limit the number of OSDs that willbe adjusted, specify max_osds as well; the default is 4. Increasing theseparameters can speed leveling of OSD utilization, at the potential cost ofgreater impact on client operations due to more data moving at once.

To determine which and how many PGs and OSDs will be affected by a given invocationyou can test before executing.

  1. ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]

Adding —no-increasing to either command prevents increasing anyoverride weights that are currently < 1.00000. This can be useful whenyou are balancing in a hurry to remedy full or nearful OSDs orwhen some OSDs are being evacuated or slowly brought into service.

Deployments utilizing Nautilus (or later revisions of Luminous and Mimic)that have no pre-Luminous cients may instead wish to instead enable thebalancer` module for ceph-mgr.

Add/remove an IP address to/from the blacklist. When adding an address,you can specify how long it should be blacklisted in seconds; otherwise,it will default to 1 hour. A blacklisted address is prevented fromconnecting to any OSD. Blacklisting is most often used to prevent alagging metadata server from making bad changes to data on the OSDs.

These commands are mostly only useful for failure testing, asblacklists are normally maintained automatically and shouldn’t needmanual intervention.

  1. ceph osd blacklist add ADDRESS[:source_port] [TIME]
  2. ceph osd blacklist rm ADDRESS[:source_port]

Creates/deletes a snapshot of a pool.

  1. ceph osd pool mksnap {pool-name} {snap-name}
  2. ceph osd pool rmsnap {pool-name} {snap-name}

Creates/deletes/renames a storage pool.

  1. ceph osd pool create {pool-name} [pg_num [pgp_num]]
  2. ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
  3. ceph osd pool rename {old-name} {new-name}

Changes a pool setting.

  1. ceph osd pool set {pool-name} {field} {value}

Valid fields are:

  • size: Sets the number of copies of data in the pool.

  • pg_num: The placement group number.

  • pgp_num: Effective number when calculating pg placement.

  • crush_rule: rule number for mapping placement.

Get the value of a pool setting.

  1. ceph osd pool get {pool-name} {field}

Valid fields are:

  • pg_num: The placement group number.

  • pgp_num: Effective number of placement groups when calculating placement.

Sends a scrub command to OSD {osd-num}. To send the command to all OSDs, use *.

  1. ceph osd scrub {osd-num}

Sends a repair command to OSD.N. To send the command to all OSDs, use *.

  1. ceph osd repair N

Runs a simple throughput benchmark against OSD.N, writing TOTAL_DATA_BYTESin write requests of BYTES_PER_WRITE each. By default, the testwrites 1 GB in total in 4-MB increments.The benchmark is non-destructive and will not overwrite existing liveOSD data, but might temporarily affect the performance of clientsconcurrently accessing the OSD.

  1. ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]

To clear an OSD’s caches between benchmark runs, use the ‘cache drop’ command

  1. ceph tell osd.N cache drop

To get the cache statistics of an OSD, use the ‘cache status’ command

  1. ceph tell osd.N cache status

MDS Subsystem

Change configuration parameters on a running mds.

  1. ceph tell mds.{mds-id} config set {setting} {value}

Example:

  1. ceph tell mds.0 config set debug_ms 1

Enables debug messages.

  1. ceph mds stat

Displays the status of all metadata servers.

  1. ceph mds fail 0

Marks the active MDS as failed, triggering failover to a standby if present.

Todo

ceph mds subcommands missing docs: set, dump, getmap, stop, setmap

Mon Subsystem

Show monitor stats:

  1. ceph mon stat
  2.  
  3. e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c

The quorum list at the end lists monitor nodes that are part of the current quorum.

This is also available more directly:

  1. ceph quorum_status -f json-pretty
  1. {
  2. "election_epoch": 6,
  3. "quorum": [
  4. 0,
  5. 1,
  6. 2
  7. ],
  8. "quorum_names": [
  9. "a",
  10. "b",
  11. "c"
  12. ],
  13. "quorum_leader_name": "a",
  14. "monmap": {
  15. "epoch": 2,
  16. "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
  17. "modified": "2016-12-26 14:42:09.288066",
  18. "created": "2016-12-26 14:42:03.573585",
  19. "features": {
  20. "persistent": [
  21. "kraken"
  22. ],
  23. "optional": []
  24. },
  25. "mons": [
  26. {
  27. "rank": 0,
  28. "name": "a",
  29. "addr": "127.0.0.1:40000\/0",
  30. "public_addr": "127.0.0.1:40000\/0"
  31. },
  32. {
  33. "rank": 1,
  34. "name": "b",
  35. "addr": "127.0.0.1:40001\/0",
  36. "public_addr": "127.0.0.1:40001\/0"
  37. },
  38. {
  39. "rank": 2,
  40. "name": "c",
  41. "addr": "127.0.0.1:40002\/0",
  42. "public_addr": "127.0.0.1:40002\/0"
  43. }
  44. ]
  45. }
  46. }

The above will block until a quorum is reached.

For a status of just the monitor you connect to (use -m HOST:PORTto select):

  1. ceph mon_status -f json-pretty
  1. {
  2. "name": "b",
  3. "rank": 1,
  4. "state": "peon",
  5. "election_epoch": 6,
  6. "quorum": [
  7. 0,
  8. 1,
  9. 2
  10. ],
  11. "features": {
  12. "required_con": "9025616074522624",
  13. "required_mon": [
  14. "kraken"
  15. ],
  16. "quorum_con": "1152921504336314367",
  17. "quorum_mon": [
  18. "kraken"
  19. ]
  20. },
  21. "outside_quorum": [],
  22. "extra_probe_peers": [],
  23. "sync_provider": [],
  24. "monmap": {
  25. "epoch": 2,
  26. "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
  27. "modified": "2016-12-26 14:42:09.288066",
  28. "created": "2016-12-26 14:42:03.573585",
  29. "features": {
  30. "persistent": [
  31. "kraken"
  32. ],
  33. "optional": []
  34. },
  35. "mons": [
  36. {
  37. "rank": 0,
  38. "name": "a",
  39. "addr": "127.0.0.1:40000\/0",
  40. "public_addr": "127.0.0.1:40000\/0"
  41. },
  42. {
  43. "rank": 1,
  44. "name": "b",
  45. "addr": "127.0.0.1:40001\/0",
  46. "public_addr": "127.0.0.1:40001\/0"
  47. },
  48. {
  49. "rank": 2,
  50. "name": "c",
  51. "addr": "127.0.0.1:40002\/0",
  52. "public_addr": "127.0.0.1:40002\/0"
  53. }
  54. ]
  55. }
  56. }

A dump of the monitor state:

  1. ceph mon dump
  2.  
  3. dumped monmap epoch 2
  4. epoch 2
  5. fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
  6. last_changed 2016-12-26 14:42:09.288066
  7. created 2016-12-26 14:42:03.573585
  8. 0: 127.0.0.1:40000/0 mon.a
  9. 1: 127.0.0.1:40001/0 mon.b
  10. 2: 127.0.0.1:40002/0 mon.c