Description
Map string to index.
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
handleInvalid | Strategy to handle unseen token when doing prediction, one of “keep”, “skip” or “error” | String | “keep” | |
selectedCol | Name of the selected column used for processing | String | ✓ | |
reservedCols | Names of the columns to be retained in the output table | String[] | null | |
outputCol | Name of the output column | String | null |
Script Example
Code
data = np.array([
["football"],
["football"],
["football"],
["basketball"],
["basketball"],
["tennis"],
])
df_data = pd.DataFrame({
"f0": data[:, 0],
})
data = dataframeToOperator(df_data, schemaStr='f0 string', op_type='stream')
stringindexer = StringIndexerTrainBatchOp() \
.setSelectedCol("f0") \
.setStringOrderType("frequency_asc")
predictor = StringIndexerPredictStreamOp().setSelectedCol("f0").setOutputCol("f0_indexed")
model = stringindexer.linkFrom(data)
predictor.linkFrom(model, data).print()
Results
f0 f0_indexed
0 football 2
1 football 2
2 football 2
3 basketball 1
4 basketball 1
5 tennis 0