Description

Find the closest cluster center for every point.

Parameters

Name Description Type Required? Default Value
predictionDistanceCol Column name of prediction. String
predictionCol Column name of prediction. String
predictionDetailCol Column name of prediction result, it will include detailed info. String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. import numpy as np
  2. import pandas as pd
  3. data = np.array([
  4. [0, "0 0 0"],
  5. [1, "0.1,0.1,0.1"],
  6. [2, "0.2,0.2,0.2"],
  7. [3, "9 9 9"],
  8. [4, "9.1 9.1 9.1"],
  9. [5, "9.2 9.2 9.2"]
  10. ])
  11. df = pd.DataFrame({"id": data[:, 0], "vec": data[:, 1]})
  12. inOp1 = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
  13. inOp2 = StreamOperator.fromDataframe(df, schemaStr='id int, vec string')
  14. kmeans = KMeansTrainBatchOp().setVectorCol("vec").setK(2)
  15. predictBatch = KMeansPredictBatchOp().setPredictionCol("pred")
  16. kmeans.linkFrom(inOp1)
  17. predictBatch.linkFrom(kmeans, inOp1)
  18. [model,predict] = collectToDataframes(kmeans, predictBatch)
  19. print(model)
  20. print(predict)
  21. predictStream = KMeansPredictStreamOp(kmeans).setPredictionCol("pred")
  22. predictStream.linkFrom(inOp2)
  23. predictStream.print(refreshInterval=-1)
  24. StreamOperator.execute()

Results

Model
  1. model_id model_info
  2. 0 0 {"vectorCol":"\"vec\"","latitudeCol":null,"lon...
  3. 1 1048576 {"clusterId":0,"weight":6.0,"vec":{"data":[9.0...
  4. 2 2097152 {"clusterId":1,"weight":6.0,"vec":{"data":[0.1...
Prediction
  1. rowID id vec pred
  2. 0 0 0 0 0 1
  3. 1 1 0.1,0.1,0.1 1
  4. 2 2 0.2,0.2,0.2 1
  5. 3 3 9 9 9 0
  6. 4 4 9.1 9.1 9.1 0
  7. 5 5 9.2 9.2 9.2 0