Description

One-hot maps a serial of columns of category indices to a column of sparse binary vector. It will produce a model of one hot, and then it can transform data to binary format using this model.

Parameters

Name Description Type Required? Default Value
dropLast drop last Boolean true
ignoreNull ignore null Boolean false
selectedCols Names of the columns used for processing String[]

Script Example

Script

  1. data = np.array([
  2. ["assisbragasm", 1],
  3. ["assiseduc", 1],
  4. ["assist", 1],
  5. ["assiseduc", 1],
  6. ["assistebrasil", 1],
  7. ["assiseduc", 1],
  8. ["assistebrasil", 1],
  9. ["assistencialgsamsung", 1]
  10. ])
  11. # load data
  12. df = pd.DataFrame({"query": data[:, 0], "weight": data[:, 1]})
  13. inOp = dataframeToOperator(df, schemaStr='query string, weight long', op_type='batch')
  14. # one hot train
  15. one_hot = OneHotTrainBatchOp().setSelectedCols(["query"]).setDropLast(False).setIgnoreNull(False)
  16. model = inOp.link(one_hot)
  17. # batch predict
  18. predictor = OneHotPredictBatchOp().setOutputCol("predicted_r").setReservedCols(["weight"])
  19. print(BatchOperator.collectToDataframe(predictor.linkFrom(model, inOp)))
  20. # stream predict
  21. inOp2 = dataframeToOperator(df, schemaStr='query string, weight long', op_type='stream')
  22. predictor = OneHotPredictStreamOp(model).setOutputCol("predicted_r").setReservedCols(["weight"])
  23. predictor.linkFrom(inOp2).print()
  24. StreamOperator.execute()

Result

  1. weight predicted_r
  2. 0 1 $6$4:1.0
  3. 1 1 $6$3:1.0
  4. 2 1 $6$2:1.0
  5. 3 1 $6$3:1.0
  6. 4 1 $6$1:1.0
  7. 5 1 $6$3:1.0
  8. 6 1 $6$1:1.0
  9. 7 1 $6$0:1.0