Balancer

The balancer can optimize the placement of PGs across OSDs inorder to achieve a balanced distribution, either automatically or in asupervised fashion.

Status

The current status of the balancer can be checked at any time with:

  1. ceph balancer status

Automatic balancing

The automatic balancing can be enabled, using the default settings, with:

  1. ceph balancer on

The balancer can be turned back off again with:

  1. ceph balancer off

This will use the crush-compat mode, which is backward compatiblewith older clients, and will make small changes to the datadistribution over time to ensure that OSDs are equally utilized.

Throttling

No adjustments will be made to the PG distribution if the cluster isdegraded (e.g., because an OSD has failed and the system has not yethealed itself).

When the cluster is healthy, the balancer will throttle its changessuch that the percentage of PGs that are misplaced (i.e., that need tobe moved) is below a threshold of (by default) 5%. Themax_misplaced threshold can be adjusted with:

  1. ceph config set mgr mgr/balancer/max_misplaced .07 # 7%

Modes

There are currently two supported balancer modes:

  • crush-compat. The CRUSH compat mode uses the compat weight-setfeature (introduced in Luminous) to manage an alternative set ofweights for devices in the CRUSH hierarchy. The normal weightsshould remain set to the size of the device to reflect the targetamount of data that we want to store on the device. The balancerthen optimizes the weight-set values, adjusting them up or down insmall increments, in order to achieve a distribution that matchesthe target distribution as closely as possible. (Because PGplacement is a pseudorandom process, there is a natural amount ofvariation in the placement; by optimizing the weights wecounter-act that natural variation.)

Notably, this mode is fully backwards compatible with olderclients: when an OSDMap and CRUSH map is shared with older clients,we present the optimized weights as the “real” weights.

The primary restriction of this mode is that the balancer cannothandle multiple CRUSH hierarchies with different placement rules ifthe subtrees of the hierarchy share any OSDs. (This is normallynot the case, and is generally not a recommended configurationbecause it is hard to manage the space utilization on the sharedOSDs.)

  • upmap. Starting with Luminous, the OSDMap can store explicitmappings for individual OSDs as exceptions to the normal CRUSHplacement calculation. These upmap entries provide fine-grainedcontrol over the PG mapping. This CRUSH mode will optimize theplacement of individual PGs in order to achieve a balanceddistribution. In most cases, this distribution is “perfect,” whichan equal number of PGs on each OSD (+/-1 PG, since they might notdivide evenly).

Note that using upmap requires that all clients be Luminous or newer.

The default mode is crush-compat. The mode can be adjusted with:

  1. ceph balancer mode upmap

or:

  1. ceph balancer mode crush-compat

Supervised optimization

The balancer operation is broken into a few distinct phases:

  • building a plan

  • evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a plan

  • executing the plan

To evaluate and score the current distribution:

  1. ceph balancer eval

You can also evaluate the distribution for a single pool with:

  1. ceph balancer eval <pool-name>

Greater detail for the evaluation can be seen with:

  1. ceph balancer eval-verbose ...

The balancer can generate a plan, using the currently configured mode, with:

  1. ceph balancer optimize <plan-name>

The name is provided by the user and can be any useful identifying string. The contents of a plan can be seen with:

  1. ceph balancer show <plan-name>

All plans can be shown with:

  1. ceph balancer ls

Old plans can be discarded with:

  1. ceph balancer rm <plan-name>

Currently recorded plans are shown as part of the status command:

  1. ceph balancer status

The quality of the distribution that would result after executing a plan can be calculated with:

  1. ceph balancer eval <plan-name>

Assuming the plan is expected to improve the distribution (i.e., it has a lower score than the current cluster state), the user can execute that plan with:

  1. ceph balancer execute <plan-name>