Description

Transform all words into lower case, and remove extra space.

Parameters

Name Description Type Required? Default Value
selectedCol Name of the selected column used for processing String
outputCol Name of the output column String null
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. data = np.array([
  2. [0, 'That is an English Book!'],
  3. [1, 'Do you like math?'],
  4. [2, 'Have a good day!']
  5. ])
  6. df = pd.DataFrame({"id": data[:, 0], "text": data[:, 1]})
  7. inOp1 = dataframeToOperator(df, schemaStr='id long, text string', op_type='batch')
  8. op = TokenizerBatchOp().setSelectedCol("text")
  9. print(BatchOperator.collectToDataframe(op.linkFrom(inOp1)))
  10. inOp2 = dataframeToOperator(df, schemaStr='id long, text string', op_type='stream')
  11. op = TokenizerStreamOp().setSelectedCol("text")
  12. op.linkFrom(inOp2).print()
  13. StreamOperator.execute()

Results

  1. id text
  2. 0 1 do you like math?
  3. 1 0 that is an english book!
  4. 2 2 have a good day!