Description

Isotonic Regression. Implement parallelized pool adjacent violators algorithm. Support single feature input or vector input(extractor one index of the vector).

Parameters

Name Description Type Required? Default Value
predictionCol Column name of prediction. String

Script Example

Code

  1. data = np.array([[0.35, 1],\
  2. [0.6, 1],\
  3. [0.55, 1],\
  4. [0.5, 1],\
  5. [0.18, 0],\
  6. [0.1, 1],\
  7. [0.8, 1],\
  8. [0.45, 0],\
  9. [0.4, 1],\
  10. [0.7, 0],\
  11. [0.02, 1],\
  12. [0.3, 0],\
  13. [0.27, 1],\
  14. [0.2, 0],\
  15. [0.9, 1]])
  16. df = pd.DataFrame({"feature" : data[:,0], "label" : data[:,1]})
  17. data = dataframeToOperator(df, schemaStr="label double, feature double",op_type="batch")
  18. dataStream = dataframeToOperator(df, schemaStr="label double, feature double",op_type="stream")
  19. trainOp = IsotonicRegTrainBatchOp()\
  20. .setFeatureCol("feature")\
  21. .setLabelCol("label")
  22. model = trainOp.linkFrom(data)
  23. predictOp = IsotonicRegPredictStreamOp(model).setPredictionCol("result")
  24. res = predictOp.linkFrom(dataStream)
  25. res.print()

Results

Model
model_id model_info
0 {“vectorCol”:”\”col2\””,”featureIndex”:”0”,”featureCol”:null}
1048576 [0.02,0.3,0.35,0.45,0.5,0.7]
2097152 [0.5,0.5,0.6666666865348816,0.6666666865348816,0.75,0.75]
Prediction
col1 col2 col3 pred
1.0 0.9 1.0 0.75
0.0 0.7 1.0 0.75
1.0 0.35 1.0 0.6666666865348816
1.0 0.02 1.0 0.5
1.0 0.27 1.0 0.5
1.0 0.5 1.0 0.75
0.0 0.18 1.0 0.5
0.0 0.45 1.0 0.6666666865348816
1.0 0.8 1.0 0.75
1.0 0.6 1.0 0.75
1.0 0.4 1.0 0.6666666865348816
0.0 0.3 1.0 0.5
1.0 0.55 1.0 0.75
0.0 0.2 1.0 0.5
1.0 0.1 1.0 0.5