Description

Transform data type from Vector to Columns.

Parameters

Name Description Type Required? Default Value
handleInvalid Strategy to handle unseen token String “ERROR”
reservedCols Names of the columns to be retained in the output table String[] null
schemaStr Formatted schema String
vectorCol Name of a vector column String
lazyPrintTransformDataEnabled Enable lazyPrint of ModelInfo Boolean false
lazyPrintTransformDataTitle Title of ModelInfo in lazyPrint String null
lazyPrintTransformDataNum Title of ModelInfo in lazyPrint Integer -1
lazyPrintTransformStatEnabled Enable lazyPrint of ModelInfo Boolean false
lazyPrintTransformStatTitle Title of ModelInfo in lazyPrint String null

Script Example

Code

  1. import numpy as np
  2. import pandas as pd
  3. data = np.array([['1', '{"f0":"1.0","f1":"2.0"}', '$3$0:1.0 1:2.0', 'f0:1.0,f1:2.0', '1.0,2.0', 1.0, 2.0],
  4. ['2', '{"f0":"4.0","f1":"8.0"}', '$3$0:4.0 1:8.0', 'f0:4.0,f1:8.0', '4.0,8.0', 4.0, 8.0]])
  5. df = pd.DataFrame({"row":data[:,0], "json":data[:,1], "vec":data[:,2], "kv":data[:,3], "csv":data[:,4], "f0":data[:,5], "f1":data[:,6]})
  6. data = dataframeToOperator(df, schemaStr="row string, json string, vec string, kv string, csv string, f0 double, f1 double",op_type="batch")
  7. op = VectorToColumns()\
  8. .setVectorCol("vec")\
  9. .setReservedCols(["row"]).setSchemaStr("f0 double, f1 double")\
  10. .transform(data)
  11. op.print()

Results

row f0 f1
1 1.0 2.0
2 4.0 8.0