Description

* A one-hot stream operator that maps a serial of columns of category indices to a column of sparse binary vectors.

Parameters

Name Description Type Required? Default Value
reservedCols Names of the columns to be retained in the output table String[] null
outputCol Name of the output column String

Script Example

Script

  1. data = np.array([
  2. ["assisbragasm", 1],
  3. ["assiseduc", 1],
  4. ["assist", 1],
  5. ["assiseduc", 1],
  6. ["assistebrasil", 1],
  7. ["assiseduc", 1],
  8. ["assistebrasil", 1],
  9. ["assistencialgsamsung", 1]
  10. ])
  11. # load data
  12. df = pd.DataFrame({"query": data[:, 0], "weight": data[:, 1]})
  13. inOp = dataframeToOperator(df, schemaStr='query string, weight long', op_type='batch')
  14. # one hot train
  15. one_hot = OneHotTrainBatchOp().setSelectedCols(["query"]).setDropLast(False).setIgnoreNull(False)
  16. model = inOp.link(one_hot)
  17. # batch predict
  18. predictor = OneHotPredictBatchOp().setOutputCol("predicted_r").setReservedCols(["weight"])
  19. print(BatchOperator.collectToDataframe(predictor.linkFrom(model, inOp)))
  20. # stream predict
  21. inOp2 = dataframeToOperator(df, schemaStr='query string, weight long', op_type='stream')
  22. predictor = OneHotPredictStreamOp(model).setOutputCol("predicted_r").setReservedCols(["weight"])
  23. predictor.linkFrom(inOp2).print()
  24. StreamOperator.execute()

Result

  1. weight predicted_r
  2. 0 1 $6$4:1.0
  3. 1 1 $6$3:1.0
  4. 2 1 $6$2:1.0
  5. 3 1 $6$3:1.0
  6. 4 1 $6$1:1.0
  7. 5 1 $6$3:1.0
  8. 6 1 $6$1:1.0
  9. 7 1 $6$0:1.0