Description

PCA is dimension reduction of discrete feature, projects vectors to a low-dimensional space. PcaTrainBatchOp is train a model which can be used to batch predict and stream predict The calculation is done using eigen on the correlation or covariance matrix.

Parameters

Name Description Type Required? Default Value
k the value of K. Integer
calculationType compute type, be CORR, COV_SAMPLE, COVAR_POP. String “CORR”
transformType ‘SIMPLE’ or ‘SUBMEAN’, SIMPLE is data model, SUBMEAN is (data - mean) model String “SIMPLE”
selectedCols Names of the columns used for processing String[] null
vectorCol Name of a vector column String null
withMean Centers the data with mean before scaling. Boolean true
withStd Scales the data to unit standard deviation. true by default Boolean true
reservedCols Names of the columns to be retained in the output table String[] null
predictionCol Column name of prediction. String
vectorCol Name of a vector column String null

Script Example

Script

  1. data = np.array([
  2. [0.0,0.0,0.0],
  3. [0.1,0.2,0.1],
  4. [0.2,0.2,0.8],
  5. [9.0,9.5,9.7],
  6. [9.1,9.1,9.6],
  7. [9.2,9.3,9.9]
  8. ])
  9. df = pd.DataFrame({"x1": data[:, 0], "x2": data[:, 1], "x3": data[:, 2]})
  10. # batch source
  11. inOp = dataframeToOperator(df, schemaStr='x1 double, x2 double, x3 double', op_type='batch')
  12. pca = PCA()\
  13. .setK(2)\
  14. .setSelectedCols(["x1","x2","x3"])\
  15. .setPredictionCol("pred")
  16. # train
  17. model = pca.fit(inOp)
  18. # batch predict
  19. model.transform(inOp).print()
  20. # stream predict
  21. inStreamOp = dataframeToOperator(df, schemaStr='x1 double, x2 double, x3 double', op_type='stream')
  22. model.transform(inStreamOp).print()
  23. StreamOperator.execute()

Result

x1 x2 x3 pred
9.0 9.5 9.7 3.2280384305400736,1.1516225426477789E-4
0.2 0.2 0.8 0.13565076707329407,0.09003329494282108
9.2 9.3 9.9 3.250783163664603,0.0456526246528135
9.1 9.1 9.6 3.182618319978973,0.027469531992220464
0.1 0.2 0.1 0.045855205015063565,-0.012182917696915518
0.0 0.0 0.0 0.0,0.0