Description

One-hot batch operator maps a serial of columns of category indices to a column of sparse binary vectors.

Parameters

Name Description Type Required? Default Value
handleInvalid Strategy to handle unseen token when doing prediction, one of “keep”, “skip” or “error” String “keep”
encode Encode method,”INDEX”, “VECTOR”, “ASSEMBLED_VECTOR” String INDEX
dropLast drop last Boolean true
selectedCols Names of the columns used for processing String[]
outputCols Names of the output columns String[] null
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Script

  1. import numpy as np
  2. import pandas as pd
  3. data = np.array([
  4. [1.1, True, "2", "A"],
  5. [1.1, False, "2", "B"],
  6. [1.1, True, "1", "B"],
  7. [2.2, True, "1", "A"]
  8. ])
  9. df = pd.DataFrame({"double": data[:, 0], "bool": data[:, 1], "number": data[:, 2], "str": data[:, 3]})
  10. inOp1 = BatchOperator.fromDataframe(df, schemaStr='double double, bool boolean, number int, str string')
  11. inOp2 = StreamOperator.fromDataframe(df, schemaStr='double double, bool boolean, number int, str string')
  12. onehot = OneHotTrainBatchOp().setSelectedCols(["double", "bool", "number", "str"]).setDiscreteThresholds(2)
  13. predictBatch = OneHotPredictBatchOp().setSelectedCols(["double", "bool"]).setEncode("ASSEMBLED_VECTOR").setOutputCols(["pred"]).setDropLast(False)
  14. onehot.linkFrom(inOp1)
  15. predictBatch.linkFrom(onehot, inOp1)
  16. [model,predict] = collectToDataframes(onehot, predictBatch)
  17. print(model)
  18. print(predict)
  19. predictStream = OneHotPredictStreamOp(onehot).setSelectedCols(["double", "bool"]).setEncode("ASSEMBLED_VECTOR").setOutputCols(["vec"])
  20. predictStream.linkFrom(inOp2)
  21. predictStream.print(refreshInterval=-1)
  22. StreamOperator.execute()

Result

  1. double bool number str pred
  2. 0 1.1 True 2 A $6$0:1.0 3:1.0
  3. 1 1.1 False 2 B $6$0:1.0 5:1.0
  4. 2 1.1 True 1 B $6$0:1.0 3:1.0
  5. 3 2.2 True 1 A $6$2:1.0 3:1.0