Data Placement Overview

Ceph stores, replicates and rebalances data objects across a RADOS clusterdynamically. With many different users storing objects in different pools fordifferent purposes on countless OSDs, Ceph operations require some dataplacement planning. The main data placement planning concepts in Ceph include:

  • Pools: Ceph stores data within pools, which are logical groups for storingobjects. Pools manage the number of placement groups, the number of replicas,and the CRUSH rule for the pool. To store data in a pool, you must havean authenticated user with permissions for the pool. Ceph can snapshot pools.See Pools for additional details.

  • Placement Groups: Ceph maps objects to placement groups (PGs).Placement groups (PGs) are shards or fragments of a logical object poolthat place objects as a group into OSDs. Placement groups reduce the amountof per-object metadata when Ceph stores the data in OSDs. A larger number ofplacement groups (e.g., 100 per OSD) leads to better balancing. SeePlacement Groups for additional details.

  • CRUSH Maps: CRUSH is a big part of what allows Ceph to scale withoutperformance bottlenecks, without limitations to scalability, and without asingle point of failure. CRUSH maps provide the physical topology of thecluster to the CRUSH algorithm to determine where the data for an objectand its replicas should be stored, and how to do so across failure domainsfor added data safety among other things. See CRUSH Maps for additionaldetails.

  • Balancer: The balancer is a feature that will automatically optimize thedistribution of PGs across devices to achieve a balanced data distribution,maximizing the amount of data that can be stored in the cluster and evenlydistributing the workload across OSDs.

When you initially set up a test cluster, you can use the default values. Onceyou begin planning for a large Ceph cluster, refer to pools, placement groupsand CRUSH for data placement operations.