Description

It is summary of table, support count, mean, variance, min, max, sum.

Parameters

Name Description Type Required? Default Value
selectedCols Names of the columns used for processing String[] null

Script Example

Script

  1. data = np.array([
  2. ["a", 1, 1,2.0, True],
  3. ["c", 1, 2, -3.0, True],
  4. ["a", 2, 2,2.0, False],
  5. ["c", 0, 0, 0.0, False]
  6. ])
  7. df = pd.DataFrame({"f_string": data[:, 0], "f_long": data[:, 1], "f_int": data[:, 2], "f_double": data[:, 3], "f_boolean": data[:, 4]})
  8. source = dataframeToOperator(df, schemaStr='f_string string, f_long long, f_int int, f_double double, f_boolean boolean', op_type='batch')
  9. summarizer = SummarizerBatchOp()\
  10. .setSelectedCols(["f_long", "f_int", "f_double"])
  11. summary = summarizer.linkFrom(source).collectSummary()
  12. print(summary.sum('f_double'))

Result

  1. 1.0