Description
VectorAssembler is a transformer that combines a given list of columns(vector or numerical column) into a single vector column. It is useful for combining features generated by different feature generators into a single feature vector. VectorAssembler accepts the following input column types: all numeric types, and vector type. In each row, the values of the input columns will be concatenated into a vector in the specified order.
this operator can transform batch data.
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
handleInvalid | parameter for how to handle invalid data (NULL values) | String | “error” | |
selectedCols | Names of the columns used for processing | String[] | ✓ | |
outputCol | Name of the output column | String | ✓ | |
reservedCols | Names of the columns to be retained in the output table | String[] | null |
Script Example
Code
data = np.array([["0", "$6$1:2.0 2:3.0 5:4.3", "3.0 2.0 3.0"],\
["1", "$8$1:2.0 2:3.0 7:4.3", "3.0 2.0 3.0"],\
["2", "$8$1:2.0 2:3.0 7:4.3", "2.0 3.0"]])
df = pd.DataFrame({"id" : data[:,0], "c0" : data[:,1], "c1" : data[:,2]})
data = dataframeToOperator(df, schemaStr="id string, c0 string, c1 string",op_type="batch")
res = VectorAssemblerBatchOp()\
.setSelectedCols(["c0", "c1"])\
.setOutputCol("table2vec")
res.linkFrom(data).collectToDataframe()