Description

Maps columns of indices to strings, based on the model fitted by {@link StringIndexer}.

While {@link StringIndexerModel} maps string to index, IndexToString maps index to string. However, IndexToString does not have a corresponding {@link com.alibaba.alink.pipeline.EstimatorBase}. Instead, IndexToString uses model data in StringIndexerModel to perform predictions.

IndexToString use the name of the {@link StringIndexerModel} to get the model data. The referenced {@link StringIndexerModel} should be created before calling transform method.

A common use case is as follows:

StringIndexer stringIndexer = new StringIndexer() .setModelName(“name_a”) // The fitted StringIndexerModel will have name “name_a”. .setSelectedCol(…);

StringIndexerModel model = stringIndexer.fit(…); // This model will have name “name_a”.

IndexToString indexToString = new IndexToString() .setModelName(“name_a”) // Should match the name of one StringIndexerModel. .setSelectedCol(…) .setOutputCol(…);

indexToString.transform(…); // Will relies on a StringIndexerModel with name “name_a” to do transformation.

The reason we use model name registration mechanism here is to make possible stacking both StringIndexer and IndexToString into a {@link Pipeline}. For examples,

StringIndexer stringIndexer = new StringIndexer() .setModelName(“si_model_0”).setSelectedCol(“label”);

MultilayerPerceptronClassifier mlpc = new MultilayerPerceptronClassifier() .setVectorCol(“features”).setLabelCol(“label”).setPredictionCol(“predicted_label”);

IndexToString indexToString = new IndexToString() .setModelName(“si_model_0”).setSelectedCol(“predicted_label”);

Pipeline pipeline = new Pipeline().add(stringIndexer).add(mlpc).add(indexToString);

pipeline.fit(…);

Parameters

Name Description Type Required? Default Value
modelName Name of the model String
selectedCol Name of the selected column used for processing String
reservedCols Names of the columns to be retained in the output table String[] null
outputCol Name of the output column String null

Script Example

Code

  1. data = np.array([
  2. ["football"],
  3. ["football"],
  4. ["football"],
  5. ["basketball"],
  6. ["basketball"],
  7. ["tennis"],
  8. ])
  9. df_data = pd.DataFrame({
  10. "f0": data[:, 0],
  11. })
  12. data = dataframeToOperator(df_data, schemaStr='f0 string', op_type="batch")
  13. stringIndexer = StringIndexerTrainBatchOp() \
  14. .setModelName("string_indexer_model") \
  15. .setSelectedCol("f0") \
  16. .setStringOrderType("frequency_asc")
  17. model = stringIndexer.linkFrom(data)
  18. string2int = StringIndexerPredictBatchOp() \
  19. .setSelectedCol("f0").setOutputCol("f0_indexed")
  20. indexed = string2int.linkFrom(model, data)
  21. predictor = IndexToStringPredictBatchOp().setSelectedCol("f0_indexed").setOutputCol("f0_indxed_unindexed");
  22. predictor.linkFrom(model, indexed).print()

Results

  1. f0|f0_indexed|f0_indxed_unindexed
  2. --|----------|-------------------
  3. football|2|football
  4. football|2|football
  5. football|2|football
  6. basketball|1|basketball
  7. basketball|1|basketball
  8. tennis|0|tennis