Description

VectorSlicer is a transformer that takes a feature vector and outputs a new feature vector with a sub-array of the original features. It is useful for extracting features from a vector column.

Parameters

Name Description Type Required? Default Value
indices indices of a vector to be sliced int[] null
selectedCol Name of the selected column used for processing String
outputCol Name of the output column String null
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Script

  1. data = np.array([["1:3,2:4,4:7", 1],
  2. ["0:3,5:5", 3],
  3. ["2:4,4:5", 4]])
  4. df = pd.DataFrame({"vec" : data[:,0], "id" : data[:,1]})
  5. data = dataframeToOperator(df, schemaStr="vec string, id bigint",op_type="batch")
  6. vecSlice = VectorSliceBatchOp().setSelectedCol("vec").setOutputCol("vec_slice").setIndices([1,2,3])
  7. vecSlice.linkFrom(data).collectToDataframe()

Result

vec id vec_slice
1:3,2:4,4:7 1 $3$0:3.0 1:4.0
0:3,5:5 3 $3$
2:4,4:5 4 $3$1:4.0