Description

Binarize a continuous variable using a threshold.

Parameters

Name Description Type Required? Default Value
threshold Binarization threshold, when number is greater than or equal to threshold, it will be set 1.0, else 0.0. Double 0.0
selectedCol Name of the selected column used for processing String
outputCol Name of the output column String null
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Code

  1. # -*- coding=UTF-8 -*-
  2. import numpy as np
  3. import pandas as pd
  4. data = np.array([
  5. [1.1, True, "2", "A"],
  6. [1.1, False, "2", "B"],
  7. [1.1, True, "1", "B"],
  8. [2.2, True, "1", "A"]
  9. ])
  10. df = pd.DataFrame({"double": data[:, 0], "bool": data[:, 1], "number": data[:, 2], "str": data[:, 3]})
  11. inOp = BatchOperator.fromDataframe(df, schemaStr='double double, bool boolean, number int, str string')
  12. binarizer = Binarizer().setSelectedCol("double").setThreshold(2.0)
  13. binarizer.transform(inOp).print()

Results

Output Data
  1. rowID double bool number str
  2. 0 0.0 True 2 A
  3. 1 0.0 False 2 B
  4. 2 0.0 True 1 B
  5. 3 1.0 True 1 A