11.9. Release 0.221

General Changes

  • Fix error during stats collection phase of query planning.
  • Fix a performance regression for some outer joins without equality predicates whenjoin_distribution_type is set to AUTOMATIC.
  • Improve performance for queries that have constant VARCHAR predicates on join columns.
  • Add a variant of strpos() that returns the position of the N-th instance of the substring.
  • Add strrpos() that returns the position of the N-th instance of a substring from the back of a string.
  • Add aggregation function entropy().
  • Add classification aggregation functions classification_miss_rate(), classification_precision(),classification_recall(), classification_thresholds().
  • Add overload of approx_set() which takes in the maximum standard error.
  • Add max_tasks_per_stage session property and stage.max-tasks-per-stage config property tolimit the number of tasks per stage for grouped execution. Setting this session property allows queriesrunning with grouped execution to use a predictable amount of memory independent of the cluster size.
  • Add encryption for spill files (see Spill to Disk).

Web UI Changes

  • Add information about query warnings to the web UI.

Raptor Changes

  • Revert the change introduced in 0.219 to rebalance bucket assignment after restartingthe cluster. Automatic rebalancing can cause unexpected downtime when restarting the clusterto resolve emergent issues.

Hive Connector Changes

  • Improve coordinator memory utilization for Hive splits.
  • Improve performance of writing large ORC files.

SPI Changes

  • Add PageSinkProperties for createPageSink in PageSinkProvider andConnectorPageSinkProvider. It contains a boolean partitionCommitRequired, which isfalse by default. See the note below about commitPartition for more information.
  • Add commitPartition to Metadata and ConnectorMetadata. This SPI is coupled withPageSinkProperties#partitionCommitRequired and is used by the engine to commit a partition of data to the targetconnector. The connector that implements this SPI should ensure that if PageSinkProperties#isPartitionCommitRequiredis true in ConnectorPageSinkProvider#createPageSink, the written data is not published untilConnectorMetadata#commitPartition is called. Also, it is expected for the connector to add SUPPORTS_PARTITION_COMMITin Connector#getCapabilities.
  • Add ExpressionOptimizer in RowExpressionService. ExpressionOptimizer simplifies a RowExpressionand prunes redundant part of it.
  • Add pushNegationToLeaves method to LogicalRowExpressions to push negation down below conjunction or disjunctionfor a logical expression.
  • Replace SplitSchedulingStrategy with SplitSchedulingContext in ConnectorSplitManager. SplitSchedulingContextcontains the SplitSchedulingStrategy and a boolean schedulerUsesHostAddresses that indicates whether the network topologyis used during scheduling. If false, the connector doesn’t need to provide the host addresses for remotely accessible splits.