Snapshot and restore data

You can create a backup for YugabyteDB using snapshots. Here are some points to keep in mind.

  • Distributed backups using snapshots
    • Massively parallel, efficient for very large data sets
    • Snapshots are not transactional across the whole table, but only on each tablet #2086.
    • Multi-table transactional snapshot is in the road map #2084.
    • Single table snapshots don’t work in YSQL #2083.
    • Yugabyte Platform automates these steps for you.
  • Implementation notes:
    • Once the snapshot command is issued, we will “buffer” newly incoming writes to that tablet without writing them immediately.
    • For the existing data: we flush it to disk and hardlink the files in a .snapshots directory on each tablet.
    • These steps are pretty fast - small flush to disk and hardlinks. Most likely the incoming operations that were buffered will not timeout.
    • The buffered writes are now opened up for writes.
    • The snapshot operation is done. Because YugabyteDB is an LSM database, these files will never get modified.
    • If this takes longer, some ops can timeout but in practice, users should expect such slowness occasionally when using network storage (AWS EBS, Persistent Disk in GCP, SAN storage, etc.).

In this tutorial, you will be using YCQL, but the same APIs are used in YSQL.

Step 1: Create a local cluster

To create a local cluster, see Create a local cluster.

  1. $ ./bin/yb-ctl create
  1. Creating cluster.
  2. Waiting for cluster to be ready.
  3. ----------------------------------------------------------------------------------------------------
  4. | Node Count: 1 | Replication Factor: 1 |
  5. ----------------------------------------------------------------------------------------------------
  6. | JDBC : postgresql://[email protected]:5433 |
  7. | YSQL Shell : bin/ysqlsh |
  8. | YCQL Shell : bin/cqlsh |
  9. | YEDIS Shell : bin/redis-cli |
  10. | Web UI : http://127.0.0.1:7000/ |
  11. | Cluster Data : /home/guru/yugabyte-data |
  12. ----------------------------------------------------------------------------------------------------
  13. For more info, please use: yb-ctl status

For details on options, see yb-ctl reference.

Step 2: Create a table with data

After getting started on YCQL API, open cqlsh:

  1. $ ./bin/cqlsh

Create a keyspace, table, and insert some test data.

  1. cqlsh> CREATE KEYSPACE ydb;
  2. cqlsh> CREATE TABLE IF NOT EXISTS ydb.test_tb(user_id INT PRIMARY KEY);
  3. cqlsh> INSERT INTO ydb.test_tb(user_id) VALUES (5);

You can verify that you have data in the database by running a simple SELECT statement.

  1. cqlsh> SELECT * FROM ydb.test_tb;

  1. user_id

  1. 5

(1 rows)

Step 3: Create a snapshot

Create a snapshot using the yb-admin create_snapshot command:

  1. $ ./bin/yb-admin create_snapshot ydb test_tb
  2. Started flushing table ydb.test_tb
  3. Flush request id: fe0db953a7a5416c90f01b1e11a36d24
  4. Waiting for flushing...
  5. Flushing complete: SUCCESS
  6. Started snapshot creation: 4963ed18fc1e4f1ba38c8fcf4058b295

To see when your snapshot is ready, you can run the yb-admin list_snapshots command.

  1. $ ./bin/yb-admin list_snapshots
  2. Snapshot UUID State
  3. 4963ed18fc1e4f1ba38c8fcf4058b295 COMPLETE

Step 3.1: Export the snapshot

Before exporting the snapshot, you need to export a metadata file that describes the snapshot.

  1. $ ./bin/yb-admin export_snapshot 4963ed18fc1e4f1ba38c8fcf4058b295 test_tb.snapshot
  1. Exporting snapshot 4963ed18fc1e4f1ba38c8fcf4058b295 (COMPLETE) to file test_tb.snapshot
  2. Snapshot meta data was saved into file: test_tb.snapshot

Next, you need to copy the actual data from the table and tablets. In this case, youhave to use a script that copies all data. The file path structure is:

  1. <yb_data_dir>/node-<node_number>/disk-<disk_number>/yb-data/tserver/data/rocksdb/table-<table_id>/[tablet-<tablet_id>.snapshots]/<snapshot_id>
  • <yb_data_dir> is the directory where YugabyteDB data is stored. (default=~/yugabyte-data)
  • <node_number> is used when multiple nodes are running on the same server (for testing, QA, and development). The default value is 1.
  • <disk_number> when running yugabyte on multiple disks with the —fs_data_dirs option. The default value is 1.
  • <table_id> is the UUID of the table. You can get it from the Admin UI.
  • <tablet_id> in each table there is a list of tablets. Each tablet has a <tablet_id>.snapshots directory that you need to copy.
  • <snapshot_id> there is a directory for each snapshot since you can have multiple completed snapshots on each server.

This directory structure is specific to yb-ctl, which is a local testing tool.In practice, for each server, you will use the —fs_data_dirs configuration option, which is a comma-separated list of paths where to put the data (normally different paths should be on different disks).In this yb-ctl example, these are the full paths up to the disk-x.

Step 3.2: Copy snapshot data to another directory

TipTo get a snapshot of a multi-node cluster, you need to go into each node and copythe folders of ONLY the leader tablets on that node. There is no need to keep a copy for each replica, since each tablet-replica hasa copy of the same data.

First, get the table_id UUID that you want to snapshot. You can find the UUID inthe Admin UI (http://127.0.0.1:7000/tables) under User Tables.

For each table, there are multiple tablets where the data is stored. You need to get a list of tablets and the leader for each of them.

  1. $ ./bin/yb-admin list_tablets ydb test_tb 0
  1. Tablet UUID Range Leader
  2. cea3aaac2f10460a880b0b4a2a4b652a partition_key_start: "" partition_key_end: "\177\377" 127.0.0.1:9100
  3. e509cf8eedba410ba3b60c7e9138d479 partition_key_start: "\177\377" partition_key_end: "" 127.0.0.1:9100

The third argument is for limiting the number of returned results. Setting it to 0 returns all tablets.

Using this information, you can construct the full path of all directories where snapshots are stored for each (tablet, snapshot_id).

You can create a small script to manually copy, or move, the folders to a backup directory or external storage.

TipWhen doing RF1 as the source, the output of the yb-admin, like listing the tablets, only shows LEADERS because there’s only one copy, which is the leader.

Step 4: Destroy the cluster and create a new one

Now destroy the cluster.

  1. $ ./bin/yb-ctl destroy
  1. Destroying cluster.

Next, spin up a new cluster with three nodes in the replicated setup.

  1. ./bin/yb-ctl --rf 3 create
  1. Creating cluster.
  2. Waiting for cluster to be ready.
  3. ----------------------------------------------------------------------------------------------------
  4. | Node Count: 3 | Replication Factor: 3 |
  5. ----------------------------------------------------------------------------------------------------
  6. | JDBC : postgresql://[email protected]:5433 |
  7. | YSQL Shell : bin/ysqlsh |
  8. | YCQL Shell : bin/cqlsh |
  9. | YEDIS Shell : bin/redis-cli |
  10. | Web UI : http://127.0.0.1:7000/ |
  11. | Cluster Data : /home/guru/yugabyte-data |
  12. ----------------------------------------------------------------------------------------------------
  13. For more info, please use: yb-ctl status

TipMake sure to get the master IP address from yb-ctl status since you have multiple nodes on different IP addresses.

Step 5: Trigger snapshot import

TipThe keyspace and table can be different from the exported one.

First, import the snapshot file into YugabyteDB.

  1. $ ./bin/yb-admin import_snapshot test_tb.snapshot ydb test_tb
  1. Read snapshot meta file test_tb.snapshot
  2. Importing snapshot 4963ed18fc1e4f1ba38c8fcf4058b295 (COMPLETE)
  3. Target imported table name: ydb.test_tb
  4. Table being imported: ydb.test_tb
  5. Successfully applied snapshot.
  6. Object Old ID New ID
  7. Keyspace c478ed4f570841489dd973aacf0b3799 c478ed4f570841489dd973aacf0b3799
  8. Table ff4389ee7a9d47ff897d3cec2f18f720 ff4389ee7a9d47ff897d3cec2f18f720
  9. Tablet 0 cea3aaac2f10460a880b0b4a2a4b652a cea3aaac2f10460a880b0b4a2a4b652a
  10. Tablet 1 e509cf8eedba410ba3b60c7e9138d479 e509cf8eedba410ba3b60c7e9138d479
  11. Snapshot 4963ed18fc1e4f1ba38c8fcf4058b295 4963ed18fc1e4f1ba38c8fcf4058b295

After you import the metadata file, you see some changes:

  • Old ID and New ID for table and tablets.
  • table_id and tablet_id have changed, so you have a different paths from previously.
  • Each tablet_id has changed, so you have different tablet-<tablet_id> directories.
  • When restoring, you have to use the new IDs to get the right paths to move data.Using these IDs, you can restore the previous .snapshot folders to the new paths.

TipFor each tablet, you need to copy the snapshots folder on all replicas.

You can start restoring the snapshot using the yb-admin restore_snapshot command:

  1. $ ./bin/yb-admin restore_snapshot 4963ed18fc1e4f1ba38c8fcf4058b295
  1. Started restoring snapshot: 4963ed18fc1e4f1ba38c8fcf4058b295

After some time, you can see that the restore has completed:

  1. $ ./bin/yb-admin list_snapshots
  2. Snapshot UUID State
  3. 4963ed18fc1e4f1ba38c8fcf4058b295 COMPLETE

Step 6: Verify the data

  1. $ ./bin/cqlsh
  1. cqlsh> SELECT * FROM ydb.test_tb;
  2. user_id
  3. ---------
  4. 5

Finally, if no longer needed, you can delete the snapshot and increase disk space.

  1. $ ./bin/yb-admin delete_snapshot 4963ed18fc1e4f1ba38c8fcf4058b295
  1. Deleted snapshot: 4963ed18fc1e4f1ba38c8fcf4058b295

This was a guide on how to snapshot and restore data on YugabyteDB. In the Yugabyte Platform and Yugabyte Cloud, all of the manual steps above are automated.