Cluster system tables

To enable internal introspection of the cluster state, the user can make queries to special service tables (system views). These tables are accessible from the cluster’s root directory and use the .sys system path prefix.

Hereinafter, in the descriptions of available fields, the Key column contains the corresponding table’s primary key field index.

Distributed Storage

Information about the operation of distributed storage is contained in several interconnected tables, each of which is responsible for describing its entity, such as:

  • PDisk
  • VSlot
  • Group
  • Storage Pool

In addition, there is a separate table that shows statistics on the use of the number of groups in different storage pools and whether these pools can be increased.

ds_pdisks

FieldTypeKeyValue
NodeIdUint320ID of the node where a PDisk is running.
PDiskIdUint321ID of the PDisk (unique within the node).
TypeStringMedia type (ROT, SSD, NVME).
KindUint64A user-defined numeric ID that is needed to group disks with the same type of media into different subgroups.
PathStringPath to the block device inside the machine.
GuidUint64A unique ID that is generated randomly when adding a disk to the system and is designed to prevent data loss in the event of disk swapping.
BoxIdUint64ID of the Box that this PDisk belongs to.
SharedWithOsBoolFlag indicating if the “SharedWithOs” label is available. Set manually when creating a PDisk. You can use it to filter disks when creating new groups.
ReadCentricBoolFlag indicating if the “ReadCentric” label is available. Set manually when creating a PDisk. You can use it to filter disks when creating new groups.
AvailableSizeUint64The number of bytes that can be allocated on the PDisk.
TotalSizeUint64The total number of bytes on the PDisk.
StatusStringPDisk operation mode that affects its participation in the allocation of groups (ACTIVE, INACTIVE, BROKEN, FAULT, and TO_BE_REMOVED).
StatusChangeTimestampTimestampThe time when the Status was last changed. NULL indicates that the Status hasn’t changed since the creation of PDisk.
ExpectedSlotCountUint32The maximum number of VSlots that can be created on this PDisk.
NumActiveSlotsUint32The number of slots that are currently active.

ds_vslots

FieldTypeKeyValue
NodeIdUint320ID of the node where a VSlot is running.
PDiskIdUint321ID of the PDisk inside the node where the VSlot is running.
VSlotIdUint322ID of the VSlot inside the PDisk.
GroupIdUint32Number of the storage group that this VSlot belongs to.
GroupGenerationUint32Generation of the storage group configuration that this VSlot belongs to.
FailRealmUint32Relative number of the fail realm of the VSlot within the storage group.
FailDomainUint32Relative number of the fail domain of the VSlot within the fail realm.
VDiskUint32Relative number of the VSlot inside the fail domain.
AllocatedSizeUint64The number of bytes that the VSlot occupies on the PDisk.
AvailableSizeUint64The number of bytes that can be allocated to this VSlot.
StatusStringStatus of the VDisk running in this VSlot (INIT_PENDING, REPLICATING, READY, or ERROR).
KindStringPreset VDisk operation mode (Default, Log, …).

Please not that the (NodeId, PDiskId) tuple creates an external key to the ds_pdisks table and the (GroupId) to the ds_groups table.

ds_groups

FieldTypeKeyValue
GroupIdUint320Number of the storage group in the cluster.
GenerationUint32Storage group configuration generation.
ErasureSpeciesStringGroup redundancy coding mode (block-4-2, mirror-3-dc, mirror-3of4, …).
BoxIdUint64ID of the Box that this group is created in.
StoragePoolIdUint64ID of the storage pool inside the Box that this group operates in.
EncryptionModeUint32Group data encryption and its algorithm (if enabled).
LifeCyclePhaseUint32Availability of a generated encryption key (if encryption is enabled).
AllocatedSizeUint64The number of allocated bytes of data in the group (reduced to user bytes, that is, to redundancy).
AvailableSizeUint64The number of bytes of user data available for allocation (up to redundancy as well).
SeenOperationalBoolA Boolean flag that indicates whether the group was operational after its creation.
PutTabletLogLatencyInterval90th percentile of the PutTabletLog request execution time.
PutUserDataLatencyInterval90th percentile of the PutUserData request execution time.
GetFastLatencyInterval90th percentile of the GetFast request execution time.

In this table, the (BoxId, StoragePoolId) tuple creates an external key to the ds_storage_pools table.

ds_storage_pools

FieldTypeKeyValue
BoxIdUint640ID of the Box that this storage pool belongs to.
StoragePoolIdUint641ID of the storage pool inside the Box.
NameStringUser-defined storage pool name (used when linking tablets and storage pools).
GenerationUint64Storage pool configuration generation (number of changes).
ErasureSpeciesStringRedundancy coding mode for all groups within this storage pool.
VDiskKindStringPreset operation mode for all VDisks in this storage pool.
KindStringA user-defined string description of the purpose of the pool, which can also be used for filtering.
NumGroupsUint32Number of groups within this storage pool.
EncryptionModeUint32Data encryption setting for all groups (similar to ds_groups.EncryptionMode).
SchemeshardIdUint64ID of the SchemeShard object of the schema that this storage pool belongs to (as of now, always NULL).
PathIdUint64ID of the node of the schema object inside the specified SchemeShard that this storage pool belongs to.

ds_storage_stats

Unlike other tables that show physical entities, the ds_storage_stats table shows aggregated storage statistics.

FieldTypeKeyValue
BoxIdUint640ID of the Box that statistics are calculated for.
PDiskFilterString1A string description of filters that select a PDisk to create groups (for example, by media type).
ErasureSpeciesString2Redundancy coding mode that statistics are collected for.
CurrentGroupsCreatedUint32Number of groups created with the specified characteristics.
CurrentAllocatedSizeUint64Total space occupied by all groups from CurrentGroupsCreated.
CurrentAvailableSizeUint64Total space that is available to all groups from CurrentGroupsCreated.
AvailableGroupsToCreateUint32Number of groups with the specified characteristics that can be created taking into account the need for a reserve.
AvailableSizeToCreateUint64Number of available bytes that will be obtained when creating all groups from AvailableGroupsToCreate.

It should be noted that AvailableGroupsToCreate shows the maximum number of groups that can be created if no other types of groups are created. So when extending a storage pool, the count of AvailableGroupsToCreate in several rows of statistics may change.

Notes

Please keep in mind that load caused by accessing system views is more analytical in nature, and therefore making frequent queries to them in large DBs will consume a lot of system resources. A load of about 1-2 rps is quite acceptable.