5.15. Pinot Connector

The Pinot connector allows querying and creating tables in an external Pinotdatabase. This can be used to query pinot data or join pinot data withsomething else.

Configuration

To configure the Pinot connector, create a catalog properties filein etc/catalog named, for example, pinot.properties, tomount the Pinot connector as the pinot catalog.Create the file with the following contents, replacing theconnection properties as appropriate for your setup:

  1. connector.name=pinot
  2. pinot.controller-urls=controller_host1:9000,controller_host2:9000

Where the pinot.controller-urls property allows you to specify acomma separated list of the pinot controller host/port pairs.

Multiple Pinot Servers

You can have as many catalogs as you need, so if you have additionalPinot clusters, simply add another properties file to etc/catalogwith a different name (making sure it ends in .properties). Forexample, if you name the property file sales.properties, Prestowill create a catalog named sales using the configured connector.

Querying Pinot

The Pinot catalog exposes all pinot tables inside a flat schema. Theschema name is immaterial when querying but running SHOW SCHEMAS,will show just one schema entry of default.

The name of the pinot catalog is the catalog file you created abovewithout the .properties extension.

For example, if you created afile called mypinotcluster.properties, you can see all the tablesin it using the command:

  1. SHOW TABLES from mypinotcluster.default

OR:

  1. SHOW TABLES from mypinotcluster.foo

Both of these commands will list all the tables in your pinot cluster.This is because Pinot does not have a notion of schemas.

Consider you have a table called clicks in the mypinotcluster.You can see a list of the columns in the clicks table using eitherof the following:

  1. DESCRIBE mypinotcluster.dontcare.clicks;
  2. SHOW COLUMNS FROM mypinotcluster.dontcare.clicks;

Finally, you can access the clicks table:

  1. SELECT count(*) FROM mypinotcluster.default.clicks;

How the Pinot connector works

The connector tries to push the maximal subquery inferred from thepresto query into pinot. It can push down everything Pinot supportsincluding aggregations, group by, all UDFs etc. It generates thecorrect Pinot PQL keeping Pinot’s quirks in mind.

By default, it sends aggregation and limit queries to the Pinot brokerand does a parallel scan for non-aggregation/non-limit queries. Thepinot broker queries create a single split that lets the Pinot brokerdo the scatter gather. Whereas, in the parallel scan mode, there isone split created for one-or-more Pinot segments and the Pinot serversare directly contacted by the Presto servers (ie., the Pinot broker isnot involved in the parallel scan mode)

There are a few configurations that control this behavior:

  • pinot.prefer-broker-queries: This config is true by default.Setting it to false will also create parallel plans foraggregation and limit queries.
  • pinot.forbid-segment-queries: This config is false by default.Setting it to true will forbid parallel querying and force allquerying to happen via the broker.
  • pinot.non-aggregate-limit-for-broker-queries: To preventoverwhelming the broker, the connector only allows querying thepinot broker for short queries. We define a short query tobe either an aggregation (or group-by) query or a query with a limitless than the value configured forpinot.non-aggregate-limit-for-broker-queries. The default valuefor this limit is 25K rows.