APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)

Allows multiple COUNT(DISTINCT) operations within a single query, by internally rewriting each COUNT(DISTINCT) to use the NDV() function. The resulting count is approximate rather than precise.

Type: Boolean; recognized values are 1 and 0, or true and false; any other value interpreted as false

Default: false (shown as 0 in output of SET statement)

Examples:

The following examples show how the APPX_COUNT_DISTINCT lets you work around the restriction where a query can only evaluate COUNT(DISTINCT col_name) for a single column. By default, you can count the distinct values of one column or another, but not both in a single query:

  1. [localhost:21000] > select count(distinct x) from int_t;
  2. +-------------------+
  3. | count(distinct x) |
  4. +-------------------+
  5. | 10 |
  6. +-------------------+
  7. [localhost:21000] > select count(distinct property) from int_t;
  8. +--------------------------+
  9. | count(distinct property) |
  10. +--------------------------+
  11. | 7 |
  12. +--------------------------+
  13. [localhost:21000] > select count(distinct x), count(distinct property) from int_t;
  14. ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters
  15. as count(DISTINCT x); deviating function: count(DISTINCT property)

When you enable the APPX_COUNT_DISTINCT query option, now the query with multiple COUNT(DISTINCT) works. The reason this behavior requires a query option is that each COUNT(DISTINCT) is rewritten internally to use the NDV() function instead, which provides an approximate result rather than a precise count.

  1. [localhost:21000] > set APPX_COUNT_DISTINCT=true;
  2. [localhost:21000] > select count(distinct x), count(distinct property) from int_t;
  3. +-------------------+--------------------------+
  4. | count(distinct x) | count(distinct property) |
  5. +-------------------+--------------------------+
  6. | 10 | 7 |
  7. +-------------------+--------------------------+

Related information:

COUNT Function, DISTINCT Operator, NDV Function

Parent topic: Query Options for the SET Statement