This Apache Druid extension incorporates test statistics related aggregators, including z-score and p-value. Please refer to https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/ for math background and details.

Make sure to include druid-stats extension in order to use these aggregators.

Z-Score for two sample ztests post aggregator

Please refer to https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/ and http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf for more details.

z = (p1 - p2) / S.E. (assuming null hypothesis is true)

Please see below for p1 and p2. Please note S.E. stands for standard error where

S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }

(p1 – p2) is the observed difference between two sample proportions.

zscore2sample post aggregator

  • zscore2sample: calculate the z-score using two-sample z-test while converting binary variables (e.g. success or not) to continuous variables (e.g. conversion rate).
  1. {
  2. "type": "zscore2sample",
  3. "name": "<output_name>",
  4. "successCount1": <post_aggregator> success count of sample 1,
  5. "sample1Size": <post_aggregaror> sample 1 size,
  6. "successCount2": <post_aggregator> success count of sample 2,
  7. "sample2Size" : <post_aggregator> sample 2 size
  8. }

Please note the post aggregator will be converting binary variables to continuous variables for two population proportions. Specifically

p1 = (successCount1) / (sample size 1)

p2 = (successCount2) / (sample size 2)

pvalue2tailedZtest post aggregator

  • pvalue2tailedZtest: calculate p-value of two-sided z-test from zscore
    • pvalue2tailedZtest(zscore) - the input is a z-score which can be calculated using the zscore2sample post aggregator
  1. {
  2. "type": "pvalue2tailedZtest",
  3. "name": "<output_name>",
  4. "zScore": <zscore post_aggregator>
  5. }

Example Usage

In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.

A JSON query example can be as follows:

  1. {
  2. ...
  3. "postAggregations" : {
  4. "type" : "pvalue2tailedZtest",
  5. "name" : "pvalue",
  6. "zScore" :
  7. {
  8. "type" : "zscore2sample",
  9. "name" : "zscore",
  10. "successCount1" :
  11. { "type" : "constant",
  12. "name" : "successCountFromPopulation1Sample",
  13. "value" : 300
  14. },
  15. "sample1Size" :
  16. { "type" : "constant",
  17. "name" : "sampleSizeOfPopulation1",
  18. "value" : 500
  19. },
  20. "successCount2":
  21. { "type" : "constant",
  22. "name" : "successCountFromPopulation2Sample",
  23. "value" : 450
  24. },
  25. "sample2Size" :
  26. { "type" : "constant",
  27. "name" : "sampleSizeOfPopulation2",
  28. "value" : 600
  29. }
  30. }
  31. }
  32. }