Description
The transformer standard the value of the vector using the following formula:
x_scaled = (x - mean)/sigma, where mean is the mean value of column, sigma is the standard variance.
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
selectedCol | Name of the selected column used for processing | String | ✓ | |
withMean | Centers the data with mean before scaling. | Boolean | true | |
withStd | Scales the data to unit standard deviation. true by default | Boolean | true | |
outputCol | Name of the output column | String | null |
Script Example
Script
data = np.array([["a", "10.0, 100"],\
["b", "-2.5, 9"],\
["c", "100.2, 1"],\
["d", "-99.9, 100"],\
["a", "1.4, 1"],\
["b", "-2.2, 9"],\
["c", "100.9, 1"]])
df = pd.DataFrame({"col" : data[:,0], "vector" : data[:,1]})
data = dataframeToOperator(df, schemaStr="col string, vector string",op_type="batch")
VectorStandardScaler().setSelectedCol("vector").fit(data).transform(data).collectToDataframe()
Result
col1 | vec |
---|---|
a | -0.07835182408093559,1.4595814453461897 |
c | 1.2269606224811418,-0.6520885789229323 |
b | -0.2549018445693762,-0.4814485769617911 |
a | -0.20280511721213143,-0.6520885789229323 |
c | 1.237090541689495,-0.6520885789229323 |
b | -0.25924323851581327,-0.4814485769617911 |
d | -1.6687491397923802,1.4595814453461897 |