Description

Imputer completes missing values in a dataset, but only same type of columns can be selected at the same time. Imputer Predict completes missing values in a dataset with model which trained from Inputer train. Strategy support min, max, mean or value. If min, will replace missing value with min of the column. If max, will replace missing value with max of the column. If mean, will replace missing value with mean of the column. If value, will replace missing value with the value.

Parameters

Name Description Type Required? Default Value
outputCols Names of the output columns String[] null

Script Example

  1. data = np.array([
  2. ["a", 10.0, 100],
  3. ["b", -2.5, 9],
  4. ["c", 100.2, 1],
  5. ["d", -99.9, 100],
  6. ["a", 1.4, 1],
  7. ["b", -2.2, 9],
  8. ["c", 100.9, 1],
  9. [None, None, None]
  10. ])
  11. colnames = ["col1", "col2", "col3"]
  12. selectedColNames = ["col2", "col3"]
  13. df = pd.DataFrame({"col1": data[:, 0], "col2": data[:, 1], "col3": data[:, 2]})
  14. inOp = dataframeToOperator(df, schemaStr='col1 string, col2 double, col3 long', op_type='batch')
  15. # train
  16. trainOp = ImputerTrainBatchOp()\
  17. .setSelectedCols(selectedColNames)
  18. trainOp.linkFrom(inOp)
  19. # batch predict
  20. predictOp = ImputerPredictBatchOp()
  21. predictOp.linkFrom(trainOp, inOp).print()
  22. # stream predict
  23. sinOp = dataframeToOperator(df, schemaStr='col1 string, col2 double, col3 long', op_type='stream')
  24. predictStreamOp = MaxAbsScalerPredictStreamOp(trainOp)
  25. predictStreamOp.linkFrom(sinOp).print()
  26. StreamOperator.execute()

Results

  1. col1 col2 col3
  2. 0 a 10.000000 100
  3. 1 b -2.500000 9
  4. 2 c 100.200000 1
  5. 3 d -99.900000 100
  6. 4 a 1.400000 1
  7. 5 b -2.200000 9
  8. 6 c 100.900000 1
  9. 7 None 15.414286 31