Fault tolerance

YugabyteDB can automatically handle failures and therefore provides high availability. You will create YSQL tables with a replication factor (RF) of 3 that allows a fault tolerance of 1. This means the cluster will remain available for both reads and writes even if one node fails. However, if another node fails bringing the number of failures to two, then writes will become unavailable on the cluster in order to preserve data consistency.

If you haven’t installed YugabyteDB yet, you can create a local YugabyteDB cluster within five minutes by following the Quick Start guide.

1. Create a universe

If you have a previously running local universe, destroy it using the following.

  1. $ ./bin/yb-ctl destroy

Start a new local three-node cluster with a replication factor of 3.

  1. $ ./bin/yb-ctl --rf 3 create

2. Run the sample key-value app

Download the YugabyteDB workload generator JAR file (yb-sample-apps.jar).

  1. $ wget https://github.com/yugabyte/yb-sample-apps/releases/download/v1.2.0/yb-sample-apps.jar?raw=true -O yb-sample-apps.jar

Run the SqlInserts workload against the local universe using the following command.

  1. $ java -jar ./yb-sample-apps.jar --workload SqlInserts \
  2. --nodes 127.0.0.1:5433 \
  3. --num_threads_write 1 \
  4. --num_threads_read 4

The SqlInserts workload prints some statistics while running, which is also shown below. You can read more details about the output of the workload applications at the YugabyteDB workload generator.

  1. 2018-05-10 09:10:19,538 [INFO|...] Read: 8988.22 ops/sec (0.44 ms/op), 818159 total ops | Write: 1095.77 ops/sec (0.91 ms/op), 97120 total ops | ...
  2. 2018-05-10 09:10:24,539 [INFO|...] Read: 9110.92 ops/sec (0.44 ms/op), 863720 total ops | Write: 1034.06 ops/sec (0.97 ms/op), 102291 total ops | ...

3. Observe even load across all nodes

You can check a lot of the per-node statistics by browsing to the tablet-servers page. It should look like this. The total read and write IOPS per node are highlighted in the screenshot below. Note that both the reads and the writes are roughly the same across all the nodes indicating uniform usage across the nodes.

Read and write IOPS with 3 nodes

4. Remove a node and observe continuous write availability

Remove a node from the universe.

  1. $ ./bin/yb-ctl remove_node 3

Refresh the tablet-servers page to see the stats update. The Time since heartbeat value for that node will keep increasing. Once that number reaches 60s (1 minute), YugabyteDB will change the status of that node from ALIVE to DEAD. Note that at this time the universe is running in an under-replicated state for some subset of tablets.

Read and write IOPS with 3rd node dead

4. Remove another node and observe write unavailability

Remove another node from the universe.

  1. $ ./bin/yb-ctl remove_node 2

Refresh the tablet-servers page to see the stats update. Writes are now unavailable but reads can continue to be served for whichever tablets available on the remaining node.

Read and write IOPS with 2nd node removed

6. [Optional] Clean up

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./bin/yb-ctl destroy