Secondary Indexes Beta

AttentionThis page documents an earlier version. Go to the latest (v2.1)version.

A database index is a data structure that improves the speed of data retrieval operations on a database table. Secondary indexes require additional writes and storage space to maintain the index data structure. They can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

YugabyteDB provides consistent (ACID), performant secondary indexes. They are built on top of distributed ACID transactions. You can read more about transactions in our architecture docs.

If you haven’t installed YugabyteDB yet, do so first by following the Quick Start guide.

NOTE: Secondary indexes are a work in progress. Here are some requirements to keep in mind currently when using secondary indexes in Yugabyte:

  • To create a secondary index on a table, the primary table needs to be created with distributed transaction enabled using the with transactions = { 'enabled' : true } clause.
  • The secondary index needs to be created before any data is inserted into the primary table.
  • A secondary index will be used to execute a query only when the index covers all columns selected by the query.

These requirements may be removed in the future.

1. Setup - create universe

If you have a previously running local universe, destroy it using the following.

  1. $ ./bin/yb-ctl destroy

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.

  1. $ ./bin/yb-ctl create

2. Create a table with secondary indexes

Connect to the cluster using cqlsh.

  1. $ ./bin/cqlsh
  1. Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>

Create a keyspace.

  1. cqlsh> CREATE KEYSPACE store;

Create a table with the transactions property set to enabled.

  1. cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };

Now create a secondary index on the order_date column. Note that we include the amount column in the secondary index in order to respond to queries selecting the amount column directly from the secondary index table with just one read.

  1. cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);

3. Insert sample data

Let us seed this table with some sample data. Paste the following into the cqlsh prompt.

  1. INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);

4. Perform some queries

  • Get the total amount for a customer

Let us write a query to fetch the sum total of the order amount column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id, and therefore does not use the secondary index.

  1. cqlsh> select sum(amount) from store.orders where customer_id = 1;
  1. sum(amount)——————- 120.55(1 rows)
  • Get the total amount for a specific date

Now, let us write a query to fetch the sum total of order amount across all orders for a specific date. Because we have a secondary index on the order_date column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date and avoid a full-table scan of the primary table.

  1. cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
  1. sum(amount)——————- 150.75(1 rows)

5. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./bin/yb-ctl destroy

1. Setup - create universe

If you have a previously running local universe, destroy it using the following.

  1. $ ./bin/yb-ctl destroy

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.

  1. $ ./bin/yb-ctl create

2. Create a table with secondary indexes

Connect to the cluster using cqlsh.

  1. $ ./bin/cqlsh
  1. Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>

Create a keyspace.

  1. cqlsh> CREATE KEYSPACE store;

Create a table with the transactions property set to enabled.

  1. cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };

Now create a secondary index on the order_date column. Note that we include the amount column in the secondary index in order to respond to queries selecting the amount column directly from the secondary index table with just one read.

  1. cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);

3. Insert sample data

Let us seed this table with some sample data. Paste the following into the cqlsh prompt.

  1. INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);

4. Perform some queries

  • Get the total amount for a customer

Let us write a query to fetch the sum total of the order amount column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id, and therefore does not use the secondary index.

  1. cqlsh> select sum(amount) from store.orders where customer_id = 1;
  1. sum(amount)——————- 120.55(1 rows)
  • Get the total amount for a specific date

Now, let us write a query to fetch the sum total of order amount across all orders for a specific date. Because we have a secondary index on the order_date column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date and avoid a full-table scan of the primary table.

  1. cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
  1. sum(amount)——————- 150.75(1 rows)

5. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./bin/yb-ctl destroy

1. Setup - create universe

If you have a previously running local universe, destroy it using the following.

  1. $ ./yb-docker-ctl destroy

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.

  1. $ ./yb-docker-ctl create

2. Create a table with secondary indexes

Connect to cqlsh on node 1.

  1. $ docker exec -it yb-tserver-n1 /home/yugabyte/bin/cqlsh
  1. Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>

Create a keyspace.

  1. cqlsh> CREATE KEYSPACE store;

Create a table with the transactions property set to enabled.

  1. cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };

Now create a secondary index on the order_date column. Note that we include the amount column in the secondary index in order to respond to queries selecting the amount column directly from the secondary index table with just one read.

  1. cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);

3. Insert sample data

Let us seed this table with some sample data. Paste the following into the cqlsh prompt.

  1. INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);

4. Perform some queries

  • Get the total amount for a customer

Let us write a query to fetch the sum total of the order amount column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id, and therefore does not use the secondary index.

  1. cqlsh> select sum(amount) from store.orders where customer_id = 1;
  1. sum(amount)——————- 120.55(1 rows)
  • Get the total amount for a specific date

Now, let us write a query to fetch the sum total of order amount across all orders for a specific date. Because we have a secondary index on the order_date column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date and avoid a full-table scan of the primary table.

  1. cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';
  1. sum(amount)——————- 150.75(1 rows)

5. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ ./yb-docker-ctl destroy

1. Setup - create universe and table

If you have a previously running local universe, destroy it using the following.

  1. $ kubectl delete -f yugabyte-statefulset.yaml

Start a new local cluster - by default, this will create a 3 node universe with a replication factor of 3.

  1. $ kubectl apply -f yugabyte-statefulset.yaml

Make sure there are 3 yb-tserver and 3 yb-master pods representing the 3 nodes of the cluster.

  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGEyb-master-0 1/1 Running 0 13syb-master-1 1/1 Running 0 13syb-master-2 1/1 Running 0 13syb-tserver-0 1/1 Running 1 12syb-tserver-1 1/1 Running 1 12syb-tserver-2 1/1 Running 1 12s

2. Create a table with secondary indexes

Connect to the cluster using cqlsh.

Connect to cqlsh on node 1.

  1. $ kubectl exec -it yb-tserver-0 /home/yugabyte/bin/cqlsh
  1. Connected to local cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]Use HELP for help.cqlsh>

Create a keyspace.

  1. cqlsh> CREATE KEYSPACE store;

Create a table with the transactions property set to enabled.

  1. cqlsh> CREATE TABLE store.orders ( customer_id int, order_date timestamp, amount double, PRIMARY KEY ((customer_id), order_date)) with transactions = { 'enabled' : true };

Now create a secondary index on the order_date column. Note that we include the amount column in the secondary index in order to respond to queries selecting the amount column directly from the secondary index table with just one read.

  1. cqlsh> create index orders_by_date on store.orders (order_date, customer_id) include (amount);

3. Insert sample data

Let us seed this table with some sample data. Paste the following into the cqlsh prompt.

  1. INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-02', 100.30);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (2, '2018-04-02', 50.45);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (1, '2018-04-06', 20.25);INSERT INTO store.orders (customer_id, order_date, amount) VALUES (3, '2018-04-06', 200.80);

4. Perform some queries

  • Get the total amount for a customer

Let us write a query to fetch the sum total of the order amount column across all orders for a customer. This query will be executed against the primary table using the partition key customer_id, and therefore does not use the secondary index.

  1. cqlsh> select sum(amount) from store.orders where customer_id = 1;

sum(amount)

  1. 120.55

(1 rows)

  • Get the total amount for a specific date

Now, let us write a query to fetch the sum total of order amount across all orders for a specific date. Because we have a secondary index on the order_date column of the table, the query analyzer will execute the query against the secondary index using the partition key order_date and avoid a full-table scan of the primary table.

  1. cqlsh> select sum(amount) from store.orders where order_date = '2018-04-02';

sum(amount)

  1. 150.75

(1 rows)

5. Clean up (optional)

Optionally, you can shutdown the local cluster created in Step 1.

  1. $ kubectl delete -f yugabyte-statefulset.yaml

  1. </code></code>