TimescaleDB API Reference - create_hypertable() - 《TimescaleDB v2.1 Documentation》

create_hypertable()

create_hypertable()

Creates a TimescaleDB hypertable from a PostgreSQL table (replacing the latter), partitioned on time and with the option to partition on one or more other columns (i.e., space). The PostgreSQL table cannot be an already partitioned table (declarative partitioning or inheritance). In case of a non-empty table, it is possible to migrate the data during hypertable creation using the migrate_data option, although this might take a long time and has certain limitations when the table contains foreign key constraints (see below).

After creation, all actions, such as ALTER TABLE, SELECT, etc., still work on the resulting hypertable.

Required Arguments

Name	Description
`relation`	Identifier of table to convert to hypertable.
`time_column_name`	Name of the column containing time values as well as the primary column to partition by.

Optional Arguments

Name	Description
`partitioning_column`	Name of an additional column to partition by. If provided, the `number_partitions` argument must also be provided.
`number_partitions`	Number of hash partitions to use for `partitioning_column`. Must be > 0.
`chunk_time_interval`	Interval in event time that each chunk covers. Must be > 0. As of TimescaleDB v0.11.0, default is 7 days. For previous versions, default is 1 month.
`create_default_indexes`	Boolean whether to create default indexes on time/partitioning columns. Default is TRUE.
`if_not_exists`	Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE.
`partitioning_func`	The function to use for calculating a value’s partition.
`associated_schema_name`	Name of the schema for internal hypertable tables. Default is “_timescaledb_internal”.
`associated_table_prefix`	Prefix for internal hypertable chunk names. Default is “_hyper”.
`migrate_data`	Set to TRUE to migrate any existing data from the `relation` table to chunks in the new hypertable. A non-empty table will generate an error without this option. Large tables may take significant time to migrate. Defaults to FALSE.
`time_partitioning_func`	Function to convert incompatible primary time column values to compatible ones. The function must be `IMMUTABLE`.
`replication_factor`	If set to 1 or greater, will create a distributed hypertable. Default is NULL. When creating a distributed hypertable, consider using `create_distributed_hypertable` in place of `create_hypertable`.
`data_nodes`	This is the set of data nodes that will be used for this table if it is distributed. This has no impact on non-distributed hypertables. If no data nodes are specified, a distributed hypertable will use all data nodes known by this instance.

Returns

Column	Description
`hypertable_id`	ID of the hypertable in TimescaleDB.
`schema_name`	Schema name of the table converted to hypertable.
`table_name`	Table name of the table converted to hypertable.
`created`	TRUE if the hypertable was created, FALSE when `if_not_exists` is true and no hypertable was created.

TIP:If you use SELECT * FROM create_hypertable(...) you will get the return value formatted as a table with column headings.

WARNING:The use of the migrate_data argument to convert a non-empty table can lock the table for a significant amount of time, depending on how much data is in the table. It can also run into deadlock if foreign key constraints exist to other tables.

If you would like finer control over index formation and other aspects of your hypertable, follow these migration instructions instead.

When converting a normal SQL table to a hypertable, pay attention to how you handle constraints. A hypertable can contain foreign keys to normal SQL table columns, but the reverse is not allowed. UNIQUE and PRIMARY constraints must include the partitioning key.

The deadlock is likely to happen when concurrent transactions simultaneously try to insert data into tables that are referenced in the foreign key constraints and into the converting table itself. The deadlock can be prevented by manually obtaining SHARE ROW EXCLUSIVE lock on the referenced tables before calling create_hypertable in the same transaction, see PostgreSQL documentation for the syntax.

Units

The ‘time’ column supports the following data types:

Types
Timestamp (TIMESTAMP, TIMESTAMPTZ)
DATE
Integer (SMALLINT, INT, BIGINT)

TIP:The type flexibility of the ‘time’ column allows the use of non-time-based values as the primary chunk partitioning column, as long as those values can increment.

TIP:For incompatible data types (e.g. jsonb) you can specify a function to the time_partitioning_func argument which can extract a compatible data type

The units of chunk_time_interval should be set as follows:

For time columns having timestamp or DATE types, the chunk_time_interval should be specified either as an interval type or an integral value in microseconds.
For integer types, the chunk_time_interval must be set explicitly, as the database does not otherwise understand the semantics of what each integer value represents (a second, millisecond, nanosecond, etc.). So if your time column is the number of milliseconds since the UNIX epoch, and you wish to each chunk to cover 1 day, you should specify chunk_time_interval => 86400000.

In case of hash partitioning (i.e., number_partitions is greater than zero), it is possible to optionally specify a custom partitioning function. If no custom partitioning function is specified, the default partitioning function is used. The default partitioning function calls PostgreSQL’s internal hash function for the given type, if one exists. Thus, a custom partitioning function can be used for value types that do not have a native PostgreSQL hash function. A partitioning function should take a single anyelement type argument and return a positive integer hash value. Note that this hash value is not a partition ID, but rather the inserted value’s position in the dimension’s key space, which is then divided across the partitions.

TIP:The time column in create_hypertable must be defined as NOT NULL. If this is not already specified on table creation, create_hypertable will automatically add this constraint on the table when it is executed.

Sample Usage

Convert table conditions to hypertable with just time partitioning on column time:

SELECT create_hypertable('conditions', 'time');

Convert table conditions to hypertable, setting chunk_time_interval to 24 hours.

SELECT create_hypertable('conditions', 'time', chunk_time_interval => 86400000000);
SELECT create_hypertable('conditions', 'time', chunk_time_interval => INTERVAL '1 day');

Convert table conditions to hypertable with time partitioning on time and space partitioning (4 partitions) on location:

SELECT create_hypertable('conditions', 'time', 'location', 4);

The same as above, but using a custom partitioning function:

SELECT create_hypertable('conditions', 'time', 'location', 4, partitioning_func => 'location_hash');

Convert table conditions to hypertable. Do not raise a warning if conditions is already a hypertable:

SELECT create_hypertable('conditions', 'time', if_not_exists => TRUE);

Time partition table measurements on a composite column type report using a time partitioning function: Requires an immutable function that can convert the column value into a supported column value:

CREATE TYPE report AS (reported timestamp with time zone, contents jsonb);
CREATE FUNCTION report_reported(report)
  RETURNS timestamptz
  LANGUAGE SQL
  IMMUTABLE AS
  'SELECT $1.reported';
SELECT create_hypertable('measurements', 'report', time_partitioning_func => 'report_reported');

Time partition table events, on a column type jsonb (event), which has a top level key (started) containing an ISO 8601 formatted timestamp:

CREATE FUNCTION event_started(jsonb)
  RETURNS timestamptz
  LANGUAGE SQL
  IMMUTABLE AS
  $func$SELECT ($1->>'started')::timestamptz$func$;
SELECT create_hypertable('events', 'event', time_partitioning_func => 'event_started');

Best Practices

One of the most common questions users of TimescaleDB have revolves around configuring chunk_time_interval.

Time intervals: The current release of TimescaleDB enables both the manual and automated adaption of its time intervals. With manually-set intervals, users should specify a chunk_time_interval when creating their hypertable (the default value is 1 week). The interval used for new chunks can be changed by calling set_chunk_time_interval().

The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory.

TIP:Make sure that you are planning for recent chunks from all active hypertables to fit into 25% of main memory, rather than 25% per hypertable.

To determine this, you roughly need to understand your data rate. If you are writing roughly 2GB of data per day and have 64GB of memory, setting the time interval to a week would be good. If you are writing 10GB per day on the same machine, setting the time interval to a day would be appropriate. This interval would also hold if data is loaded more in batches, e.g., you bulk load 70GB of data per week, with data corresponding to records from throughout the week.

While it’s generally safer to make chunks smaller rather than too large, setting intervals too small can lead to many chunks, which corresponds to increased planning latency for some types of queries.

TIP:One caveat is that the total chunk size is actually dependent on both the underlying data size and any indexes, so some care might be taken if you make heavy use of expensive index types (e.g., some PostGIS geospatial indexes). During testing, you might check your total chunk sizes via the chunks_detailed_size function.

Space partitions: In most cases, it is advised for users not to use space partitions. However, if you create a distributed hypertable, it is important to create space partitioning, see create_distributed_hypertable. The rare cases in which space partitions may be useful for non-distributed hypertables are described in the add_dimension section.