DISTINCT Clause

If SELECT DISTINCT is specified, only unique rows will remain in a query result. Thus only a single row will remain out of all the sets of fully matching rows in the result.

Null Processing

DISTINCT works with NULL as if NULL were a specific value, and NULL==NULL. In other words, in the DISTINCT results, different combinations with NULL occur only once. It differs from NULL processing in most other contexts.

Alternatives

It is possible to obtain the same result by applying GROUP BY across the same set of values as specified as SELECT clause, without using any aggregate functions. But there are few differences from GROUP BY approach:

  • DISTINCT can be applied together with GROUP BY.
  • When ORDER BY is omitted and LIMIT is defined, the query stops running immediately after the required number of different rows has been read.
  • Data blocks are output as they are processed, without waiting for the entire query to finish running.

Examples

ClickHouse supports using the DISTINCT and ORDER BY clauses for different columns in one query. The DISTINCT clause is executed before the ORDER BY clause.

Example table:

  1. ┌─a─┬─b─┐
  2. 2 1
  3. 1 2
  4. 3 3
  5. 2 4
  6. └───┴───┘

When selecting data with the SELECT DISTINCT a FROM t1 ORDER BY b ASC query, we get the following result:

  1. ┌─a─┐
  2. 2
  3. 1
  4. 3
  5. └───┘

If we change the sorting direction SELECT DISTINCT a FROM t1 ORDER BY b DESC, we get the following result:

  1. ┌─a─┐
  2. 3
  3. 1
  4. 2
  5. └───┘

Row 2, 4 was cut before sorting.

Take this implementation specificity into account when programming queries.