Description

One hot pipeline op.

Parameters

Name Description Type Required? Default Value
dropLast drop last Boolean true
ignoreNull ignore null Boolean false
selectedCols Names of the columns used for processing String[]
reservedCols Names of the columns to be retained in the output table String[] null
outputCol Name of the output column String

Script Example

Script

  1. data = np.array([
  2. ["assisbragasm", 1],
  3. ["assiseduc", 1],
  4. ["assist", 1],
  5. ["assiseduc", 1],
  6. ["assistebrasil", 1],
  7. ["assiseduc", 1],
  8. ["assistebrasil", 1],
  9. ["assistencialgsamsung", 1]
  10. ])
  11. # load data
  12. df = pd.DataFrame({"query": data[:, 0], "weight": data[:, 1]})
  13. inOp = dataframeToOperator(df, schemaStr='query string, weight long', op_type='batch')
  14. # one hot train
  15. one_hot = OneHotEncoder()\
  16. .setSelectedCols(["query"])\
  17. .setDropLast(False)\
  18. .setIgnoreNull(False)\
  19. .setOutputCol("predicted_r")\
  20. .setReservedCols(["weight"])
  21. model = one_hot.fit(inOp)
  22. model.transform(inOp).print()
  23. # stream predict
  24. inOp2 = dataframeToOperator(df, schemaStr='query string, weight long', op_type='stream')
  25. model.transform(inOp2).print()
  26. StreamOperator.execute()

Result

  1. weight predicted_r
  2. 0 1 $6$4:1.0
  3. 1 1 $6$3:1.0
  4. 2 1 $6$2:1.0
  5. 3 1 $6$3:1.0
  6. 4 1 $6$1:1.0
  7. 5 1 $6$3:1.0
  8. 6 1 $6$1:1.0
  9. 7 1 $6$0:1.0