Auto Rebalancing

AttentionThis page documents an earlier version. Go to the latest (v2.1)version.

YugabyteDB automatically rebalances data into newly added nodes, so that the cluster can easily be expanded if more space is needed. In this tutorial, we will look at how Yugabyte rebalances data while a workload is running. We will run a read-write workload using a pre-packaged sample application against a 3-node local universe with a replication factor of 3, and add nodes to it while the workload is running. We will then observe how the cluster rebalances its on-disk data as well as its memory footprint.

If you haven’t installed YugabyteDB yet, do so first by following the Quick Start guide.

1. Setup - create universe

If you have a previously running local universe, destroy it using the following:

  1. $ ./bin/yb-ctl destroy

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.We set the number of shards per tserver to 8 so we can better observe the load balancing during scaling.Considering there are 3 tservers and replication factor 3, there will be 72 total shards per table.

  1. $ ./bin/yb-ctl --num_shards_per_tserver 8 create

2. Run sample key-value app

Run the Cassandra sample key-value app against the local universe by typing the following command.

  1. $ java -jar ./java/yb-sample-apps.jar --workload CassandraKeyValue \
  2. --nodes 127.0.0.1:9042 \
  3. --num_threads_write 1 \
  4. --num_threads_read 4 \
  5. --value_size 4096

3. Observe data sizes per node

You can check a lot of the per-node stats by browsing to the tablet-servers page. It should look like this. The total data size per node as well as the total memory used per node are highlighted in the screenshot below. Note that both of those metrics are roughly the same across all the nodes indicating uniform usage across the nodes.

Data and memory sizes with 3 nodes

4. Add a node and observe data rebalancing

Add a node to the universe.

  1. $ ./bin/yb-ctl add_node

Now we should have 4 nodes. Refresh the tablet-servers page to see the stats update. As you refresh, you should see the new node getting more and more tablets, which would cause it to get more data as well as increase its memory footprint. Finally, all the 4 nodes should end up with a similar data distribution and memory usage.

Data and memory sizes with 4 nodes

5. Add another node and observe linear scale out

Add yet another node to the universe.

  1. $ ./bin/yb-ctl add_node

Now we should have 5 nodes. Refresh the tablet-servers page to see the stats update, and as before you should see all the nodes end up with similar data sizes and memory footprints.

Data and memory sizes with 5 nodes

YugabyteDB automatically balances the tablet leaders and followers of a universe by moving them in a rate-limited manner into the newly added nodes. This automatic balancing of the data is completely transparent to the application logic.

6. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./bin/yb-ctl destroy

1. Setup - create universe

If you have a previously running local universe, destroy it using the following:

  1. $ ./bin/yb-ctl destroy

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.We set the number of shards per tserver to 8 so we can better observe the load balancing during scaling.Considering there are 3 tservers and replication factor 3, there will be 72 total shards per table.

  1. $ ./bin/yb-ctl --num_shards_per_tserver 8 create

2. Run sample key-value app

Run the Cassandra sample key-value app against the local universe by typing the following command.

  1. $ java -jar ./java/yb-sample-apps.jar --workload CassandraKeyValue \
  2. --nodes 127.0.0.1:9042 \
  3. --num_threads_write 1 \
  4. --num_threads_read 4 \
  5. --value_size 4096

3. Observe data sizes per node

You can check a lot of the per-node stats by browsing to the tablet-servers page. It should look like this. The total data size per node as well as the total memory used per node are highlighted in the screenshot below. Note that both of those metrics are roughly the same across all the nodes indicating uniform usage across the nodes.

Data and memory sizes with 3 nodes

4. Add a node and observe data rebalancing

Add a node to the universe.

  1. $ ./bin/yb-ctl add_node

Now we should have 4 nodes. Refresh the tablet-servers page to see the stats update. As you refresh, you should see the new node getting more and more tablets, which would cause it to get more data as well as increase its memory footprint. Finally, all the 4 nodes should end up with a similar data distribution and memory usage.

Data and memory sizes with 4 nodes

5. Add another node and observe linear scale out

Add yet another node to the universe.

  1. $ ./bin/yb-ctl add_node

Now we should have 5 nodes. Refresh the tablet-servers page to see the stats update, and as before you should see all the nodes end up with similar data sizes and memory footprints.

Data and memory sizes with 5 nodes

YugabyteDB automatically balances the tablet leaders and followers of a universe by moving them in a rate-limited manner into the newly added nodes. This automatic balancing of the data is completely transparent to the application logic.

6. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./bin/yb-ctl destroy