SQL metadata tables

Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language.

Druid Brokers infer table and column metadata for each datasource from segments loaded in the cluster, and use this to plan SQL queries. This metadata is cached on Broker startup and also updated periodically in the background through SegmentMetadata queries. Background metadata refreshing is triggered by segments entering and exiting the cluster, and can also be throttled through configuration.

Druid exposes system information through special system tables. There are two such schemas available: Information Schema and Sys Schema. Information schema provides details about table and column types. The “sys” schema provides information about Druid internals like segments/tasks/servers.

INFORMATION SCHEMA

You can access table and column metadata through JDBC using connection.getMetaData(), or through the INFORMATION_SCHEMA tables described below. For example, to retrieve metadata for the Druid datasource “foo”, use the query:

  1. SELECT *
  2. FROM INFORMATION_SCHEMA.COLUMNS
  3. WHERE "TABLE_SCHEMA" = 'druid' AND "TABLE_NAME" = 'foo'

Note: INFORMATION_SCHEMA tables do not currently support Druid-specific functions like TIME_PARSE and APPROX_QUANTILE_DS. Only standard SQL functions can be used.

SCHEMATA table

INFORMATION_SCHEMA.SCHEMATA provides a list of all known schemas, which include druid for standard Druid Table datasources, lookup for Lookups, sys for the virtual System metadata tables, and INFORMATION_SCHEMA for these virtual tables. Tables are allowed to have the same name across different schemas, so the schema may be included in an SQL statement to distinguish them, e.g. lookup.table vs druid.table.

ColumnNotes
CATALOG_NAMEAlways set as druid
SCHEMA_NAMEdruid, lookup, sys, or INFORMATION_SCHEMA
SCHEMA_OWNERUnused
DEFAULT_CHARACTER_SET_CATALOGUnused
DEFAULT_CHARACTER_SET_SCHEMAUnused
DEFAULT_CHARACTER_SET_NAMEUnused
SQL_PATHUnused

TABLES table

INFORMATION_SCHEMA.TABLES provides a list of all known tables and schemas.

ColumnNotes
TABLE_CATALOGAlways set as druid
TABLE_SCHEMAThe ‘schema’ which the table falls under, see SCHEMATA table for details
TABLE_NAMETable name. For the druid schema, this is the dataSource.
TABLE_TYPE“TABLE” or “SYSTEM_TABLE”
IS_JOINABLEIf a table is directly joinable if on the right hand side of a JOIN statement, without performing a subquery, this value will be set to YES, otherwise NO. Lookups are always joinable because they are globally distributed among Druid query processing nodes, but Druid datasources are not, and will use a less efficient subquery join.
IS_BROADCASTIf a table is ‘broadcast’ and distributed among all Druid query processing nodes, this value will be set to YES, such as lookups and Druid datasources which have a ‘broadcast’ load rule, else NO.

COLUMNS table

INFORMATION_SCHEMA.COLUMNS provides a list of all known columns across all tables and schema.

ColumnNotes
TABLE_CATALOGAlways set as druid
TABLE_SCHEMAThe ‘schema’ which the table column falls under, see SCHEMATA table for details
TABLE_NAMEThe ‘table’ which the column belongs to, see TABLES table for details
COLUMN_NAMEThe column name
ORDINAL_POSITIONThe order in which the column is stored in a table
COLUMN_DEFAULTUnused
IS_NULLABLE
DATA_TYPE
CHARACTER_MAXIMUM_LENGTHUnused
CHARACTER_OCTET_LENGTHUnused
NUMERIC_PRECISION
NUMERIC_PRECISION_RADIX
NUMERIC_SCALE
DATETIME_PRECISION
CHARACTER_SET_NAME
COLLATION_NAME
JDBC_TYPEType code from java.sql.Types (Druid extension)

For example, this query returns data type information for columns in the foo table:

  1. SELECT "ORDINAL_POSITION", "COLUMN_NAME", "IS_NULLABLE", "DATA_TYPE", "JDBC_TYPE"
  2. FROM INFORMATION_SCHEMA.COLUMNS
  3. WHERE "TABLE_NAME" = 'foo'

SYSTEM SCHEMA

The “sys” schema provides visibility into Druid segments, servers and tasks.

Note: “sys” tables do not currently support Druid-specific functions like TIME_PARSE and APPROX_QUANTILE_DS. Only standard SQL functions can be used.

SEGMENTS table

Segments table provides details on all Druid segments, whether they are published yet or not.

ColumnTypeNotes
segment_idSTRINGUnique segment identifier
datasourceSTRINGName of datasource
startSTRINGInterval start time (in ISO 8601 format)
endSTRINGInterval end time (in ISO 8601 format)
sizeLONGSize of segment in bytes
versionSTRINGVersion string (generally an ISO8601 timestamp corresponding to when the segment set was first started). Higher version means the more recently created segment. Version comparing is based on string comparison.
partition_numLONGPartition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)
num_replicasLONGNumber of replicas of this segment currently being served
num_rowsLONGNumber of rows in current segment, this value could be null if unknown to Broker at query time
is_publishedLONGBoolean is represented as long type where 1 = true, 0 = false. 1 represents this segment has been published to the metadata store with used=1. See the Architecture page for more details.
is_availableLONGBoolean is represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any process(Historical or realtime). See the Architecture page for more details.
is_realtimeLONGBoolean is represented as long type where 1 = true, 0 = false. 1 if this segment is only served by realtime tasks, and 0 if any historical process is serving this segment.
is_overshadowedLONGBoolean is represented as long type where 1 = true, 0 = false. 1 if this segment is published and is fully overshadowed by some other published segments. Currently, is_overshadowed is always false for unpublished segments, although this may change in the future. You can filter for segments that “should be published” by filtering for is_published = 1 AND is_overshadowed = 0. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet. See the Architecture page for more details.
shard_specSTRINGJSON-serialized form of the segment ShardSpec
dimensionsSTRINGJSON-serialized form of the segment dimensions
metricsSTRINGJSON-serialized form of the segment metrics
last_compaction_stateSTRINGJSON-serialized form of the compaction task’s config (compaction task which created this segment). May be null if segment was not created by compaction task.

For example to retrieve all segments for datasource “wikipedia”, use the query:

  1. SELECT * FROM sys.segments WHERE datasource = 'wikipedia'

Another example to retrieve segments total_size, avg_size, avg_num_rows and num_segments per datasource:

  1. SELECT
  2. datasource,
  3. SUM("size") AS total_size,
  4. CASE WHEN SUM("size") = 0 THEN 0 ELSE SUM("size") / (COUNT(*) FILTER(WHERE "size" > 0)) END AS avg_size,
  5. CASE WHEN SUM(num_rows) = 0 THEN 0 ELSE SUM("num_rows") / (COUNT(*) FILTER(WHERE num_rows > 0)) END AS avg_num_rows,
  6. COUNT(*) AS num_segments
  7. FROM sys.segments
  8. GROUP BY 1
  9. ORDER BY 2 DESC

This query goes a step further and shows the overall profile of available, non-realtime segments across buckets of 1 million rows each for the foo datasource:

  1. SELECT ABS("num_rows" / 1000000) as "bucket",
  2. COUNT(*) as segments,
  3. SUM("size") / 1048576 as totalSizeMiB,
  4. MIN("size") / 1048576 as minSizeMiB,
  5. AVG("size") / 1048576 as averageSizeMiB,
  6. MAX("size") / 1048576 as maxSizeMiB,
  7. SUM("num_rows") as totalRows,
  8. MIN("num_rows") as minRows,
  9. AVG("num_rows") as averageRows,
  10. MAX("num_rows") as maxRows,
  11. (AVG("size") / AVG("num_rows")) as avgRowSizeB
  12. FROM sys.segments
  13. WHERE is_available = 1 AND is_realtime = 0 AND "datasource" = `foo`
  14. GROUP BY 1
  15. ORDER BY 1

If you want to retrieve segment that was compacted (ANY compaction):

  1. SELECT * FROM sys.segments WHERE last_compaction_state is not null

or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction):

  1. SELECT * FROM sys.segments WHERE last_compaction_state == 'SELECT * FROM sys.segments where last_compaction_state = 'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000, maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, compressRunOnSerialization=true}, dimensionCompression=lz4, metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'

Caveat: Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over an ingestion task. But if a segment has multiple realtime replicas, for e.g.. Kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not guaranteed that the same task gets picked every time. The num_rows column of segments table can have inconsistent values during this period. There is an open issue about this inconsistency with stream ingestion tasks.

SERVERS table

Servers table lists all discovered servers in the cluster.

ColumnTypeNotes
serverSTRINGServer name in the form host:port
hostSTRINGHostname of the server
plaintext_portLONGUnsecured port of the server, or -1 if plaintext traffic is disabled
tls_portLONGTLS port of the server, or -1 if TLS is disabled
server_typeSTRINGType of Druid service. Possible values include: COORDINATOR, OVERLORD, BROKER, ROUTER, HISTORICAL, MIDDLE_MANAGER or PEON.
tierSTRINGDistribution tier see druid.server.tier. Only valid for HISTORICAL type, for other types it’s null
current_sizeLONGCurrent size of segments in bytes on this server. Only valid for HISTORICAL type, for other types it’s 0
max_sizeLONGMax size in bytes this server recommends to assign to segments see druid.server.maxSize. Only valid for HISTORICAL type, for other types it’s 0
is_leaderLONG1 if the server is currently the ‘leader’ (for services which have the concept of leadership), otherwise 0 if the server is not the leader, or the default long value (0 or null depending on druid.generic.useDefaultValueForNull) if the server type does not have the concept of leadership

To retrieve information about all servers, use the query:

  1. SELECT * FROM sys.servers;

SERVER_SEGMENTS table

SERVER_SEGMENTS is used to join servers with segments table

ColumnTypeNotes
serverSTRINGServer name in format host:port (Primary key of servers table)
segment_idSTRINGSegment identifier (Primary key of segments table)

JOIN between “servers” and “segments” can be used to query the number of segments for a specific datasource, grouped by server, example query:

  1. SELECT count(segments.segment_id) as num_segments from sys.segments as segments
  2. INNER JOIN sys.server_segments as server_segments
  3. ON segments.segment_id = server_segments.segment_id
  4. INNER JOIN sys.servers as servers
  5. ON servers.server = server_segments.server
  6. WHERE segments.datasource = 'wikipedia'
  7. GROUP BY servers.server;

TASKS table

The tasks table provides information about active and recently-completed indexing tasks. For more information check out the documentation for ingestion tasks.

ColumnTypeNotes
task_idSTRINGUnique task identifier
group_idSTRINGTask group ID for this task, the value depends on the task type. For example, for native index tasks, it’s same as task_id, for sub tasks, this value is the parent task’s ID
typeSTRINGTask type, for example this value is “index” for indexing tasks. See tasks-overview
datasourceSTRINGDatasource name being indexed
created_timeSTRINGTimestamp in ISO8601 format corresponding to when the ingestion task was created. Note that this value is populated for completed and waiting tasks. For running and pending tasks this value is set to 1970-01-01T00:00:00Z
queue_insertion_timeSTRINGTimestamp in ISO8601 format corresponding to when this task was added to the queue on the Overlord
statusSTRINGStatus of a task can be RUNNING, FAILED, SUCCESS
runner_statusSTRINGRunner status of a completed task would be NONE, for in-progress tasks this can be RUNNING, WAITING, PENDING
durationLONGTime it took to finish the task in milliseconds, this value is present only for completed tasks
locationSTRINGServer name where this task is running in the format host:port, this information is present only for RUNNING tasks
hostSTRINGHostname of the server where task is running
plaintext_portLONGUnsecured port of the server, or -1 if plaintext traffic is disabled
tls_portLONGTLS port of the server, or -1 if TLS is disabled
error_msgSTRINGDetailed error message in case of FAILED tasks

For example, to retrieve tasks information filtered by status, use the query

  1. SELECT * FROM sys.tasks WHERE status='FAILED';

SUPERVISORS table

The supervisors table provides information about supervisors.

ColumnTypeNotes
supervisor_idSTRINGSupervisor task identifier
stateSTRINGBasic state of the supervisor. Available states: UNHEALTHY_SUPERVISOR, UNHEALTHY_TASKS, PENDING, RUNNING, SUSPENDED, STOPPING. Check Kafka Docs for details.
detailed_stateSTRINGSupervisor specific state. (See documentation of the specific supervisor for details, e.g. Kafka or Kinesis)
healthyLONGBoolean represented as long type where 1 = true, 0 = false. 1 indicates a healthy supervisor
typeSTRINGType of supervisor, e.g. kafka, kinesis or materialized_view
sourceSTRINGSource of the supervisor, e.g. Kafka topic or Kinesis stream
suspendedLONGBoolean represented as long type where 1 = true, 0 = false. 1 indicates supervisor is in suspended state
specSTRINGJSON-serialized supervisor spec

For example, to retrieve supervisor tasks information filtered by health status, use the query

  1. SELECT * FROM sys.supervisors WHERE healthy=0;