COPY TO

Export table contents to files on CrateDB node machines.

Table of contents

Synopsis

  1. COPY table_ident [ PARTITION ( partition_column = value [ , ... ] ) ]
  2. [ ( column [ , ...] ) ]
  3. [ WHERE condition ]
  4. TO DIRECTORY output_uri
  5. [ WITH ( copy_parameter [= value] [, ... ] ) ]

Description

The COPY TO command exports the contents of a table to one or more files into a given directory with unique filenames. Each node with at least one shard of the table will export its contents onto their local disk.

The created files are JSON formatted and contain one table row per line and, due to the distributed nature of CrateDB, will remain on the same nodes where the shards are.

Here’s an example:

  1. cr> COPY quotes TO DIRECTORY '/tmp/' with (compression='gzip');
  2. COPY OK, 3 rows affected ...

Note

Currently only user tables can be exported. System tables like sys.nodes and blob tables don’t work with the COPY TO statement.

Parameters

table_ident

The name (optionally schema-qualified) of the table to be exported.

column

(optional) A list of column expressions that should be exported.

Note

When declaring columns, this changes the output to JSON list format, which is currently not supported by the COPY FROM statement.

Clauses

PARTITION

If the table is partitioned, the optional PARTITION clause can be used to export data from a one partition exclusively.

  1. [ PARTITION ( partition_column = value [ , ... ] ) ]

partition_column

One of the column names used for table partitioning.

value

The respective column value.

All partition columns (specified by the PARTITIONED BY clause) must be listed inside the parentheses along with their respective values using the partition_column = value syntax (separated by commas).

Because each partition corresponds to a unique set of partition column row values, this clause uniquely identifies a single partition to export.

Tip

The SHOW CREATE TABLE statement will show you the complete list of partition columns specified by the PARTITIONED BY clause.

Caution

The exported data doesn’t contain the partition columns or the corresponding values because they are not part of the partitioned tables.

If COPY TO is used on a partitioned table without the PARTITION clause, the partition columns and values will be included in the rows of the exported files. If a partition column is a generated column, it will not be included even if the PARTITION clause is missing.

WHERE

The WHERE clauses use the same syntax as SELECT statements, allowing partial exports. (see WHERE clause for more information).

TO

The TO clause allows you to specify an output location.

  1. TO DIRECTORY output_uri

output_uri

The output URI.

The output URI can be any expression that evaluates to a string. The string must be a valid URI that uses the file:// or s3:// URI scheme.

For example:

  • file:///path/to/dir

  • s3://[<accesskey>:<secretkey>@]<bucketname>/<path>

If no URI scheme is given (e.g., /path/to/dir) the default scheme file:// will be used.

Containers

If you are running CrateDB inside a container (e.g., you are running CrateDB on Docker) the URI must point to a file inside the container.

You may have to configure a new Docker volume to accomplish this.

Microsoft Windows

If you are using Microsoft Windows, you must include the drive letter in the file URI.

For example, the above file URI should instead be written as file:///C://tmp/import_data/quotes.json.

Consult the Windows documentation for more information.

Amazon Web Services

A secretkey provided by Amazon Web Services (AWS) can contain characters such as /, + or =. Such characters must be URI encoded. The same encoding as in s3 applies.

Additionally, versions prior to 0.51.x use HTTP for connections to S3. Since 0.51.x these connections are using the HTTPS protocol. Please make sure you update your firewall rules to allow outgoing connections on port 443.

WITH

The optional WITH clause can specify parameters for the copy statement.

  1. [ WITH ( copy_parameter [= value] [, ... ] ) ]

Possible copy_parameters are:

compression

Define if and how the exported data should be compressed.

By default the output is not compressed.

Possible values for the compression setting are:

gzip

Use gzip to compress the data output.

format

Optional parameter to override default output behavior.

Possible values for the format settings are:

json_object

Each row in the result set is serialized as JSON object and written to an output file where one line contains one object. This is the default behavior if no columns are defined. Use this format to import with COPY FROM.

json_array

Each row in the result set is serialized as JSON array, storing one array per line in an output file. This is the default behavior if columns are defined.