Data Processing - MultiColStringIndexerPredict(batch) - 《Alink v1.0.1 Document》

Description
Parameters
Script Example
- Code
- Results

Description

Map string to index based on the model generated by {@link MultiStringIndexerTrainBatchOp}.

Parameters

Name	Description	Type	Required？	Default Value
handleInvalid	Strategy to handle unseen token when doing prediction, one of “keep”, “skip” or “error”	String		“keep”
selectedCols	Names of the columns used for processing	String[]	✓
reservedCols	Names of the columns to be retained in the output table	String[]		null
outputCols	Names of the output columns	String[]		null

Script Example

Code

data = np.array([
    ["football"],
    ["football"],
    ["football"],
    ["basketball"],
    ["basketball"],
    ["tennis"],
])
df_data = pd.DataFrame({
    "f0": data[:, 0],
})
data = dataframeToOperator(df_data, schemaStr='f0 string', op_type="batch")
stringindexer = MultiStringIndexerTrainBatchOp() \
    .setSelectedCols(["f0"]) \
    .setStringOrderType("frequency_asc")
predictor = MultiStringIndexerPredictBatchOp().setSelectedCols(["f0"]).setOutputCols(["f0_indexed"])
model = stringindexer.linkFrom(data)
predictor.linkFrom(model, data).print()

Results

           f0  f0_indexed
0    football           2
1    football           2
2    football           2
3  basketball           1
4  basketball           1
5      tennis           0