Description

Map string to index.

Parameters

Name Description Type Required? Default Value
handleInvalid Strategy to handle unseen token when doing prediction, one of “keep”, “skip” or “error” String “keep”
selectedCol Name of the selected column used for processing String
reservedCols Names of the columns to be retained in the output table String[] null
outputCol Name of the output column String null

Script Example

Code

  1. data = np.array([
  2. ["football"],
  3. ["football"],
  4. ["football"],
  5. ["basketball"],
  6. ["basketball"],
  7. ["tennis"],
  8. ])
  9. df_data = pd.DataFrame({
  10. "f0": data[:, 0],
  11. })
  12. data = dataframeToOperator(df_data, schemaStr='f0 string', op_type="batch")
  13. stringindexer = StringIndexerTrainBatchOp() \
  14. .setSelectedCol("f0") \
  15. .setStringOrderType("frequency_asc")
  16. predictor = StringIndexerPredictBatchOp().setSelectedCol("f0").setOutputCol("f0_indexed")
  17. model = stringindexer.linkFrom(data)
  18. predictor.linkFrom(model, data).print()

Results

  1. f0 f0_indexed
  2. 0 football 2
  3. 1 football 2
  4. 2 football 2
  5. 3 basketball 1
  6. 4 basketball 1
  7. 5 tennis 0