Description

VectorAssembler is a transformer that combines a given list of columns(vector or numerical column) into a single vector column. It is useful for combining features generated by different feature generators into a single feature vector. VectorAssembler accepts the following input column types: all numeric types, and vector type. In each row, the values of the input columns will be concatenated into a vector in the specified order.

this operator cam transform batch data.

Parameters

Name Description Type Required? Default Value
handleInvalid parameter for how to handle invalid data (NULL values) String “error”
selectedCols Names of the columns used for processing String[]
outputCol Name of the output column String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. data = np.array([["0", "$6$1:2.0 2:3.0 5:4.3", "3.0 2.0 3.0"],\
  2. ["1", "$8$1:2.0 2:3.0 7:4.3", "3.0 2.0 3.0"],\
  3. ["2", "$8$1:2.0 2:3.0 7:4.3", "2.0 3.0"]])
  4. df = pd.DataFrame({"id" : data[:,0], "c0" : data[:,1], "c1" : data[:,2]})
  5. data = dataframeToOperator(df, schemaStr="id string, c0 string, c1 string",op_type="batch")
  6. res = VectorAssembler()\
  7. .setSelectedCols(["c0", "c1"])\
  8. .setOutputCol("table2vec")
  9. res.transform(data).collectToDataframe()

Results

VectorAssembler - 图1