STDDEV, STDDEV_SAMP, STDDEV_POP Functions

An aggregate function that standard deviation of a set of numbers.

Syntax:

  1. { STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] expression)

This function works with any numeric data type.

Return type: DOUBLE in Impala 2.0 and higher; STRING in earlier releases

This function is typically used in mathematical formulas related to probability distributions.

The STDDEV_POP() and STDDEV_SAMP() functions compute the population standard deviation and sample standard deviation, respectively, of the input values. (STDDEV() is an alias for STDDEV_SAMP().) Both functions evaluate all input rows matched by the query. The difference is that STDDEV_SAMP() is scaled by 1/(N-1) while STDDEV_POP() is scaled by 1/N.

If no input rows match the query, the result of any of these functions is NULL. If a single input row matches the query, the result of any of these functions is "0.0".

Examples:

This example demonstrates how STDDEV() and STDDEV_SAMP() return the same result, while STDDEV_POP() uses a slightly different calculation to reflect that the input data is considered part of a larger “population”.

  1. [localhost:21000] > select stddev(score) from test_scores;
  2. +---------------+
  3. | stddev(score) |
  4. +---------------+
  5. | 28.5 |
  6. +---------------+
  7. [localhost:21000] > select stddev_samp(score) from test_scores;
  8. +--------------------+
  9. | stddev_samp(score) |
  10. +--------------------+
  11. | 28.5 |
  12. +--------------------+
  13. [localhost:21000] > select stddev_pop(score) from test_scores;
  14. +-------------------+
  15. | stddev_pop(score) |
  16. +-------------------+
  17. | 28.4858 |
  18. +-------------------+

This example demonstrates that, because the return value of these aggregate functions is a STRING, you must currently convert the result with CAST.

  1. [localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
  2. +-------------------+
  3. | summary |
  4. +-------------------+
  5. | Inserted 1 row(s) |
  6. +-------------------+
  7. [localhost:21000] > desc score_stats;
  8. +--------------------+--------------+---------+
  9. | name | type | comment |
  10. +--------------------+--------------+---------+
  11. | standard_deviation | decimal(7,4) | |
  12. | variance | decimal(7,4) | |
  13. +--------------------+--------------+---------+

Restrictions:

This function cannot be used in an analytic context. That is, the OVER() clause is not allowed at all with this function.

Related information:

The STDDEV(), STDDEV_POP(), and STDDEV_SAMP() functions compute the standard deviation (square root of the variance) based on the results of VARIANCE(), VARIANCE_POP(), and VARIANCE_SAMP() respectively. See VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions for details about the variance property.

Parent topic: Impala Aggregate Functions