Description

The transformer standard the value of the vector using the following formula:

x_scaled = (x - mean)/sigma, where mean is the mean value of column, sigma is the standard variance.

Parameters

Name Description Type Required? Default Value
selectedCol Name of the selected column used for processing String
withMean Centers the data with mean before scaling. Boolean true
withStd Scales the data to unit standard deviation. true by default Boolean true
outputCol Name of the output column String null

Script Example

Script

  1. data = np.array([["a", "10.0, 100"],\
  2. ["b", "-2.5, 9"],\
  3. ["c", "100.2, 1"],\
  4. ["d", "-99.9, 100"],\
  5. ["a", "1.4, 1"],\
  6. ["b", "-2.2, 9"],\
  7. ["c", "100.9, 1"]])
  8. df = pd.DataFrame({"col" : data[:,0], "vector" : data[:,1]})
  9. data = dataframeToOperator(df, schemaStr="col string, vector string",op_type="batch")
  10. VectorStandardScaler().setSelectedCol("vector").fit(data).transform(data).collectToDataframe()

Result

col1 vec
a -0.07835182408093559,1.4595814453461897
c 1.2269606224811418,-0.6520885789229323
b -0.2549018445693762,-0.4814485769617911
a -0.20280511721213143,-0.6520885789229323
c 1.237090541689495,-0.6520885789229323
b -0.25924323851581327,-0.4814485769617911
d -1.6687491397923802,1.4595814453461897