Retail Analytics

1. Start local cluster

Follow Quick Start instructions to run a local YugabyteDB cluster. Test the YSQL API as documented so that you can confirm that you have the YSQL service running on localhost:5433.

2. Load data

Download the sample schema

  1. $ wget https://raw.githubusercontent.com/yugabyte/yb-sql-workshop/master/query-using-bi-tools/schema.sql

Download the sample data

  1. $ wget https://github.com/yugabyte/yb-sql-workshop/raw/master/query-using-bi-tools/sample-data.tgz
  1. $ tar zxvf sample-data.tgz
  1. $ ls data/
  1. orders.sql products.sql reviews.sql users.sql

Connect to YugabyteDB using ysqlsh

  1. $ ./bin/ysqlsh
  1. ysqlsh (11.2)
  2. Type "help" for help.
  3. yugabyte=#

Create a database

You can do this as shown below.

  1. yugabyte=# CREATE DATABASE yb_demo;
  1. yugabyte=# GRANT ALL ON DATABASE yb_demo to yugabyte;
  1. yugabyte=# \c yb_demo;

Load data

First create the four tables necessary to store the data.

  1. yugabyte=# \i 'schema.sql';

Now load the data into the tables.

  1. yugabyte=# \i 'data/products.sql'
  1. yugabyte=# \i 'data/users.sql'
  1. yugabyte=# \i 'data/orders.sql'
  1. yugabyte=# \i 'data/reviews.sql'

3. Run queries

How are users signing up for my site?

  1. yb_demo=# SELECT DISTINCT(source) FROM users;

  1. source

Facebook Twitter Organic Affiliate Google(5 rows)

What is the most effective channel for user signups?

  1. yb_demo=# SELECT source, count(*) AS num_user_signups
  2. FROM users
  3. GROUP BY source
  4. ORDER BY num_user_signups DESC;
  1. source | num_user_signups
  2. -----------+------------------
  3. Facebook | 512
  4. Affiliate | 506
  5. Google | 503
  6. Twitter | 495
  7. Organic | 484
  8. (5 rows)

What are the most effective channels for product sales by revenue?

  1. yb_demo=# SELECT source, ROUND(SUM(orders.total)) AS total_sales
  2. FROM users, orders WHERE users.id=orders.user_id
  3. GROUP BY source
  4. ORDER BY total_sales DESC;
  1. source | total_sales
  2. -----------+-------------
  3. Facebook | 333454
  4. Google | 325184
  5. Organic | 319637
  6. Twitter | 319449
  7. Affiliate | 297605
  8. (5 rows)

What is the min, max and average price of products in the store?

  1. yb_demo=# SELECT MIN(price), MAX(price), AVG(price) FROM products;
  1. min | max | avg
  2. ------------------+------------------+------------------
  3. 15.6919436739704 | 98.8193368436819 | 55.7463996679207
  4. (1 row)

What percentage of the total sales is from the Facebook channel?

You can do this as shown below.

  1. yb_demo=# CREATE VIEW channel AS
  2. (SELECT source, ROUND(SUM(orders.total)) AS total_sales
  3. FROM users, orders
  4. WHERE users.id=orders.user_id
  5. GROUP BY source
  6. ORDER BY total_sales DESC);

Now that the view is created, we can see it in our list of relations.

  1. yb_demo=# \d
  1. List of relations
  2. Schema | Name | Type | Owner
  3. --------+----------+-------+----------
  4. public | channel | view | postgres
  5. public | orders | table | postgres
  6. public | products | table | postgres
  7. public | reviews | table | postgres
  8. public | users | table | postgres
  9. (5 rows)
  1. yb_demo=# SELECT source, total_sales * 100.0 / (SELECT SUM(total_sales) FROM channel) AS percent_sales
  2. FROM channel WHERE source='Facebook';
  1. source | percent_sales
  2. ----------+------------------
  3. Facebook | 20.9018954710909
  4. (1 row)