11.165. Release 0.76

Kafka Connector

This release adds a connector that allows querying of Apache Kafka topic datafrom Presto. Topics can be live and repeated queries will pick up new data.

Apache Kafka 0.8+ is supported although Apache Kafka 0.8.1+ is recommended.There is extensive documentation about configuringthe connector and a tutorial to get started.

MySQL and PostgreSQL Connectors

This release adds the MySQL Connector and PostgreSQL Connectorfor querying and creating tables in external relational databases. These canbe used to join or copy data between different systems like MySQL and Hive,or between two different MySQL or PostgreSQL instances, or any combination.

Cassandra Changes

The Cassandra Connector configuration propertiescassandra.client.read-timeout and cassandra.client.connect-timeoutare now specified using a duration rather than milliseconds (this makesthem consistent with all other such properties in Presto). If you werepreviously specifying a value such as 25, change it to 25ms.

The retry policy for the Cassandra client is now configurable via thecassandra.retry-policy property. In particular, the custom BACKOFFretry policy may be useful.

Hive Changes

The new Hive Connector configuration property hive.s3.socket-timeoutallows changing the socket timeout for queries that read or write to Amazon S3.Additionally, the previously added hive.s3.max-connections propertywas not respected and always used the default of 500.

Hive allows the partitions in a table to have a different schema than thetable. In particular, it allows changing the type of a column withoutchanging the column type of existing partitions. The Hive connector doesnot support this and could previously return garbage data for partitionsstored using the RCFile Text format if the column type was converted froma non-numeric type such as STRING to a numeric type such as BIGINTand the actual data in existing partitions was not numeric. The Hiveconnector now detects this scenario and fails the query after thepartition metadata has been read.

The property hive.storage-format is broken and has been disabled. Itsets the storage format on the metadata but always writes the table usingRCBINARY. This will be implemented in a future release.

General Changes

  • Fix hang in verifier when an exception occurs.
  • Fix chr() function to work with Unicode code points instead of ASCII code points.
  • The JDBC driver no longer hangs the JVM on shutdown (all threads are daemon threads).
  • Fix incorrect parsing of function arguments.
  • The bytecode compiler now caches generated code for join and group byqueries,which should improve performance and CPU efficiency for these types of queries.
  • Improve planning performance for certain trivial queries over tables with lots of partitions.
  • Avoid creating large output pages. This should mitigate some cases of“Remote page is too large” errors.
  • The coordinator/worker communication layer is now fully asynchronous.Specifically, long-poll requests no longer tie up a thread on the worker.This makes heavily loaded clusters more efficient.