11.166. Release 0.75

Hive Changes

  • The Hive S3 file system has a new configuration option,hive.s3.max-connections, which sets the maximum number ofconnections to S3. The default has been increased from 50 to 500.
  • The Hive connector now supports renaming tables. By default, this featureis not enabled. To enable it, set hive.allow-rename-table=true inyour Hive catalog properties file.

General Changes

  • Optimize count() with a constant to execute as the much faster count(*)
  • Add support for binary types to the JDBC driver
  • The legacy byte code compiler has been removed
  • New aggregation framework (~10% faster)
  • Added max_by() aggregation function
  • The approx_avg() function has been removed. Use avg() instead.
  • Fixed parsing of UNION queries that use both DISTINCT and ALL
  • Fixed cross join planning error for certain query shapes
  • Added hex and base64 conversion functions for varbinary
  • Fix the LIKE operator to correctly match against values that containmultiple lines. Previously, it would stop matching at the first newline.
  • Add support for renaming tables using the ALTER TABLE statement.
  • Add basic support for inserting data using the INSERT statement.This is currently only supported for the Raptor connector.

JSON Function Changes

The json_extract() and json_extract_scalar() functions now supportthe square bracket syntax:

  1. SELECT json_extract(json, '$.store[book]');
  2. SELECT json_extract(json, '$.store["book name"]');

As part of this change, the set of characters allowed in a non-bracketedpath segment has been restricted to alphanumeric, underscores and colons.Additionally, colons cannot be used in a un-quoted bracketed path segment.Use the new bracket syntax with quotes to match elements that containspecial characters.

Scheduler Changes

The scheduler now assigns splits to a node based on the current load on the node across all queries.Previously, the scheduler load balanced splits across nodes on a per query level. Every node can havenode-scheduler.max-splits-per-node splits scheduled on it. To avoid starvation of small queries,when the node already has the maximum allowable splits, every task can schedule at mostnode-scheduler.max-pending-splits-per-node-per-task splits on the node.

Row Number Optimizations

Queries that use the row_number() function are substantially fasterand can run on larger result sets for two types of queries.

Performing a partitioned limit that choses N arbitrary rows perpartition is a streaming operation. The following query selectsfive arbitrary rows from orders for each orderstatus:

  1. SELECT * FROM (
  2. SELECT row_number() OVER (PARTITION BY orderstatus) AS rn,
  3. custkey, orderdate, orderstatus
  4. FROM orders
  5. ) WHERE rn <= 5;

Performing a partitioned top-N that chooses the maximum or minimumN rows from each partition now uses significantly less memory.The following query selects the five oldest rows based on orderdatefrom orders for each orderstatus:

  1. SELECT * FROM (
  2. SELECT row_number() OVER (PARTITION BY orderstatus ORDER BY orderdate) AS rn,
  3. custkey, orderdate, orderstatus
  4. FROM orders
  5. ) WHERE rn <= 5;

Use the EXPLAIN statement to see if any of these optimizationshave been applied to your query.

SPI Changes

The core Presto engine no longer automatically adds a column for count(*)queries. Instead, the RecordCursorProvider will receive an empty list ofcolumn handles.

The Type and Block APIs have gone through a major refactoring in thisrelease. The main focus of the refactoring was to consolidate all type specificencoding logic in the type itself, which makes types much easier to implement.You should consider Type and Block to be a beta API as we expectfurther changes in the near future.

To simplify the API, ConnectorOutputHandleResolver has been merged intoConnectorHandleResolver. Additionally, ConnectorHandleResolver,ConnectorRecordSinkProvider and ConnectorMetadata were modified tosupport inserts.

Note

This is a backwards incompatible change with the previous connector andtype SPI, so if you have written a connector or type, you will need to updateyour code before deploying this release. In particular, make sure yourconnector can handle an empty column handles list (this can be verifiedby running SELECT count(*) on a table from your connector).