Description

StandardScaler transforms a dataset, normalizing each feature to have unit standard deviation and/or zero mean.

Parameters

Name Description Type Required? Default Value
selectedCol Name of the selected column used for processing String
withMean Centers the data with mean before scaling. Boolean true
withStd Scales the data to unit standard deviation. true by default Boolean true

Script Example

Script

  1. data = np.array([["a", "10.0, 100"],\
  2. ["b", "-2.5, 9"],\
  3. ["c", "100.2, 1"],\
  4. ["d", "-99.9, 100"],\
  5. ["a", "1.4, 1"],\
  6. ["b", "-2.2, 9"],\
  7. ["c", "100.9, 1"]])
  8. df = pd.DataFrame({"col" : data[:,0], "vector" : data[:,1]})
  9. data = dataframeToOperator(df, schemaStr="col string, vector string",op_type="batch")
  10. dataStream = dataframeToOperator(df, schemaStr="col string, vector string",op_type="stream")
  11. trainOp = VectorStandardScalerTrainBatchOp().setSelectedCol("vector")
  12. model = trainOp.linkFrom(data)
  13. VectorStandardScalerPredictStreamOp(model).linkFrom(dataStream).print()
  14. StreamOperator.execute()

Result

col1 vec
a -0.07835182408093559,1.4595814453461897
c 1.2269606224811418,-0.6520885789229323
b -0.2549018445693762,-0.4814485769617911
a -0.20280511721213143,-0.6520885789229323
c 1.237090541689495,-0.6520885789229323
b -0.25924323851581327,-0.4814485769617911
d -1.6687491397923802,1.4595814453461897