Description

One-hot maps a serial of columns of category indices to a column of sparse binary vector. It will produce a model of one hot, and then it can transform data to binary format using this model.

Parameters

Name Description Type Required? Default Value
discreteThresholdsArray discrete thresholds array Integer[]
discreteThresholds discrete thresholds array Integer Integer.MIN_VALUE
selectedCols Names of the columns used for processing String[]

Script Example

Script

  1. import numpy as np
  2. import pandas as pd
  3. data = np.array([
  4. [1.1, True, "2", "A"],
  5. [1.1, False, "2", "B"],
  6. [1.1, True, "1", "B"],
  7. [2.2, True, "1", "A"]
  8. ])
  9. df = pd.DataFrame({"double": data[:, 0], "bool": data[:, 1], "number": data[:, 2], "str": data[:, 3]})
  10. inOp1 = BatchOperator.fromDataframe(df, schemaStr='double double, bool boolean, number int, str string')
  11. inOp2 = StreamOperator.fromDataframe(df, schemaStr='double double, bool boolean, number int, str string')
  12. onehot = OneHotTrainBatchOp().setSelectedCols(["double", "bool", "number", "str"]).setDiscreteThresholds(2)
  13. predictBatch = OneHotPredictBatchOp().setSelectedCols(["double", "bool"]).setEncode("ASSEMBLED_VECTOR").setOutputCols(["pred"]).setDropLast(False)
  14. onehot.linkFrom(inOp1)
  15. predictBatch.linkFrom(onehot, inOp1)
  16. [model,predict] = collectToDataframes(onehot, predictBatch)
  17. print(model)
  18. print(predict)
  19. predictStream = OneHotPredictStreamOp(onehot).setSelectedCols(["double", "bool"]).setEncode("ASSEMBLED_VECTOR").setOutputCols(["vec"])
  20. predictStream.linkFrom(inOp2)
  21. predictStream.print(refreshInterval=-1)
  22. StreamOperator.execute()

Result

  1. double bool number str pred
  2. 0 1.1 True 2 A $6$0:1.0 3:1.0
  3. 1 1.1 False 2 B $6$0:1.0 5:1.0
  4. 2 1.1 True 1 B $6$0:1.0 3:1.0
  5. 3 2.2 True 1 A $6$2:1.0 3:1.0