Presto

Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. It has a connector architecture to query data from many data sources.This page shows how Presto can be setup to query YugabyteDB’s YCQL tables.

1. Start local cluster

Follow Quick start instructions to run a local YugabyteDB cluster. Test YugabyteDB’s Cassandra compatible API as documented so that you can confirm that you have a Cassandra compatible service running on localhost:9042. We assume you have created the keyspace and table, and inserted sample data as described there.

2. Download and configure Presto

Detailed steps are documented here.The following are the minimal setup steps for getting started.

  1. $ wget https://repo1.maven.org/maven2/io/prestosql/presto-server/309/presto-server-309.tar.gz
  1. $ tar xvf presto-server-309.tar.gz
  1. $ cd presto-server-309

Create the “etc”, “etc/catalog”, and “data” directory inside the installation directory

  1. $ mkdir etc
  1. $ mkdir etc/catalog
  1. $ mkdir data

Create node.properties file - replace <username> below

  1. $ cat > etc/node.properties
  1. node.environment=test
  2. node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
  3. node.data-dir=/Users/<username>/presto-server-309/data

Create jvm.config file

  1. $ cat > etc/jvm.config
  1. -server
  2. -Xmx6G
  3. -XX:+UseG1GC
  4. -XX:G1HeapRegionSize=32M
  5. -XX:+UseGCOverheadLimit
  6. -XX:+ExplicitGCInvokesConcurrent
  7. -XX:+HeapDumpOnOutOfMemoryError
  8. -XX:+ExitOnOutOfMemoryError

Create config.properties file

  1. $ cat > etc/config.properties
  1. coordinator=true
  2. node-scheduler.include-coordinator=true
  3. http-server.http.port=8080
  4. query.max-memory=4GB
  5. query.max-memory-per-node=1GB
  6. discovery-server.enabled=true
  7. discovery.uri=http://localhost:8080

Create log.properties file

  1. $ cat > etc/log.properties
  1. io.prestosql=INFO

Configure Cassandra connector to YugabyteDB

Create the cassandra catalog properties file in etc/catalog directory.Detailed instructions are here.

  1. $ cat > etc/catalog/cassandra.properties
  1. connector.name=cassandra
  2. cassandra.contact-points=127.0.0.1

3. Download Presto CLI

  1. $ cd ~/presto-server-309/bin
  1. $ wget https://repo1.maven.org/maven2/io/prestosql/presto-cli/309/presto-cli-309-executable.jar

Rename jar to ‘presto’. It is meant to be a self-running binary.

  1. $ mv presto-cli-309-executable.jar presto && chmod +x presto

4. Launch Presto server

  1. $ cd presto-server-309

To run in foreground mode.

  1. $ ./bin/launcher run

To run in background mode.

  1. $ ./bin/launcher start

5. Test Presto queries

Use the presto CLI to run ad-hoc queries.

  1. $ ./bin/presto --server localhost:8080 --catalog cassandra --schema default

Start using myapp.

  1. presto:default> use myapp;
  1. USE

Show the tables available.

  1. presto:myapp> show tables;

  1. Table

stock_market(1 row)

Describe a particular table.

  1. presto:myapp> describe stock_market;
  1. Column | Type | Extra | Comment
  2. ---------------+---------+-------+---------
  3. stock_symbol | varchar | |
  4. ts | varchar | |
  5. current_price | real | |
  6. (3 rows)

Query with filter

  1. presto:myapp> select * from stock_market where stock_symbol = 'AAPL';
  1. stock_symbol | ts | current_price
  2. --------------+---------------------+---------------
  3. AAPL | 2017-10-26 09:00:00 | 157.41
  4. AAPL | 2017-10-26 10:00:00 | 157.0
  5. (2 rows)

Query with aggregates

  1. presto:myapp> select stock_symbol, avg(current_price) from stock_market group by stock_symbol;
  1. stock_symbol | _col1
  2. --------------+---------
  3. GOOG | 972.235
  4. AAPL | 157.205
  5. FB | 170.365
  6. (3 rows)