Description

Bisecting KMeans prediction based on the model fitted by BisectingKMeansTrainBatchOp.

Parameters

Name Description Type Required? Default Value
predictionCol Column name of prediction. String
predictionDetailCol Column name of prediction result, it will include detailed info. String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. import numpy as np
  2. import pandas as pd
  3. data = np.array([
  4. [0, "0 0 0"],
  5. [1, "0.1,0.1,0.1"],
  6. [2, "0.2,0.2,0.2"],
  7. [3, "9 9 9"],
  8. [4, "9.1 9.1 9.1"],
  9. [5, "9.2 9.2 9.2"]
  10. ])
  11. df = pd.DataFrame({"id": data[:, 0], "vec": data[:, 1]})
  12. inOp1 = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
  13. inOp2 = StreamOperator.fromDataframe(df, schemaStr='id int, vec string')
  14. kmeans = BisectingKMeansTrainBatchOp().setVectorCol("vec").setK(2)
  15. predictBatch = BisectingKMeansPredictBatchOp().setPredictionCol("pred")
  16. kmeans.linkFrom(inOp1)
  17. predictBatch.linkFrom(kmeans, inOp1)
  18. [model,predict] = collectToDataframes(kmeans, predictBatch)
  19. print(model)
  20. print(predict)
  21. predictStream = BisectingKMeansPredictStreamOp(kmeans).setPredictionCol("pred")
  22. predictStream.linkFrom(inOp2)
  23. predictStream.print(refreshInterval=-1)
  24. StreamOperator.execute()

Results

Model
  1. rowId model_id model_info
  2. 0 0 {"vectorCol":"\"vec\"","distanceType":"\"EUCLI...
  3. 1 1048576 {"clusterId":1,"size":6,"center":{"data":[4.6,...
  4. 2 2097152 {"clusterId":2,"size":3,"center":{"data":[0.1,...
  5. 3 3145728 {"clusterId":3,"size":3,"center":{"data":[9.1,...
Prediction
  1. rowId id vec pred
  2. 0 0 0 0 0 0
  3. 1 1 0.1,0.1,0.1 0
  4. 2 2 0.2,0.2,0.2 0
  5. 3 3 9 9 9 1
  6. 4 4 9.1 9.1 9.1 1
  7. 5 5 9.2 9.2 9.2 1