BACKUP

Name

BACKUP

Description

This statement is used to back up the data under the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW BACKUP command. Only backing up tables of type OLAP is supported.

Only root or superuser users can create repositories.

grammar:

  1. BACKUP SNAPSHOT [db_name].{snapshot_name}
  2. TO `repository_name`
  3. [ON|EXCLUDE] (
  4. `table_name` [PARTITION (`p1`, ...)],
  5. ...
  6. )
  7. PROPERTIES ("key"="value", ...);

illustrate:

  • There can only be one executing BACKUP or RESTORE task under the same database.
  • The ON clause identifies the tables and partitions that need to be backed up. If no partition is specified, all partitions of the table are backed up by default
  • Tables and partitions that do not require backup are identified in the EXCLUDE clause. Back up all partition data for all tables in this database except the specified table or partition.
  • PROPERTIES currently supports the following properties:
    • “type” = “full”: indicates that this is a full update (default)
    • “timeout” = “3600”: The task timeout period, the default is one day. in seconds.

Example

  1. Fully backup the table example_tbl under example_db to the warehouse example_repo:
  1. BACKUP SNAPSHOT example_db.snapshot_label1
  2. TO example_repo
  3. ON (example_tbl)
  4. PROPERTIES ("type" = "full");
  1. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the warehouse example_repo:
  1. BACKUP SNAPSHOT example_db.snapshot_label2
  2. TO example_repo
  3. ON
  4. (
  5. example_tbl PARTITION (p1,p2),
  6. example_tbl2
  7. );
  1. Full backup of all tables except table example_tbl under example_db to warehouse example_repo:
  1. BACKUP SNAPSHOT example_db.snapshot_label3
  2. TO example_repo
  3. EXCLUDE (example_tbl);
  1. Create a warehouse named hdfs_repo, rely on Baidu hdfs broker “hdfs_broker”, the data root directory is: hdfs://hadoop-name-node:54310/path/to/repo/
  1. CREATE REPOSITORY `hdfs_repo`
  2. WITH BROKER `hdfs_broker`
  3. ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
  4. PROPERTIES
  5. (
  6. "username" = "user",
  7. "password" = "password"
  8. );
  1. Create a repository named s3_repo to link cloud storage directly without going through the broker.
  1. CREATE REPOSITORY `s3_repo`
  2. WITH S3
  3. ON LOCATION "s3://s3-repo"
  4. PROPERTIES
  5. (
  6. "AWS_ENDPOINT" = "http://s3-REGION.amazonaws.com",
  7. "AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
  8. "AWS_SECRET_KEY"="AWS_SECRET_KEY",
  9. "AWS_REGION" = "REGION"
  10. );
  1. Create a repository named hdfs_repo to link HDFS directly without going through the broker.
  1. CREATE REPOSITORY `hdfs_repo`
  2. WITH hdfs
  3. ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
  4. PROPERTIES
  5. (
  6. "fs.defaultFS"="hdfs://hadoop-name-node:54310",
  7. "hadoop.username"="user"
  8. );
  1. Create a repository named minio_repo to link minio storage directly through the s3 protocol.
  1. CREATE REPOSITORY `minio_repo`
  2. WITH S3
  3. ON LOCATION "s3://minio_repo"
  4. PROPERTIES
  5. (
  6. "AWS_ENDPOINT" = "http://minio.com",
  7. "AWS_ACCESS_KEY" = "MINIO_USER",
  8. "AWS_SECRET_KEY"="MINIO_PASSWORD",
  9. "AWS_REGION" = "REGION",
  10. "use_path_style" = "true"
  11. );

Keywords

  1. BACKUP

Best Practice

  1. Only one backup operation can be performed under the same database.

  2. The backup operation will back up the underlying table and materialized view of the specified table or partition, and only one copy will be backed up.

  3. Efficiency of backup operations

    The efficiency of backup operations depends on the amount of data, the number of Compute Nodes, and the number of files. Each Compute Node where the backup data shard is located will participate in the upload phase of the backup operation. The greater the number of nodes, the higher the upload efficiency.

    The amount of file data refers only to the number of shards, and the number of files in each shard. If there are many shards, or there are many small files in the shards, the backup operation time may be increased.