Description

VectorAssembler is a transformer that combines a given list of columns(vector or numerical column) into a single vector column. It is useful for combining features generated by different feature transformers into a single feature vector, in order to train ML models like logistic regression and decision trees. VectorAssembler accepts the following input column types: all numeric types, and vector type. In each row, the values of the input columns will be concatenated into a vector in the specified order.

this operator cam transform stream data.

Parameters

Name Description Type Required? Default Value
handleInvalid parameter for how to handle invalid data (NULL values) String “error”
selectedCols Names of the columns used for processing String[]
outputCol Name of the output column String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. data = np.array([["0", "$6$1:2.0 2:3.0 5:4.3", "3.0 2.0 3.0"],\
  2. ["1", "$8$1:2.0 2:3.0 7:4.3", "3.0 2.0 3.0"],\
  3. ["2", "$8$1:2.0 2:3.0 7:4.3", "2.0 3.0"]])
  4. df = pd.DataFrame({"id" : data[:,0], "c0" : data[:,1], "c1" : data[:,2]})
  5. data = dataframeToOperator(df, schemaStr="id string, c0 string, c1 string",op_type="stream")
  6. res = VectorAssemblerStreamOp()\
  7. .setSelectedCols(["c0", "c1"])\
  8. .setOutputCol("table2vec")
  9. res.linkFrom(data).print()
  10. StreamOperator.execute()

Results

VectorAssembler(stream) - 图1