Binarizer

  Binarization是一个将数值特征转换为二值特征的处理过程。threshold参数表示决定二值化的阈值。
值大于阈值的特征二值化为1,否则二值化为0。下面是代码调用的例子。

  1. import org.apache.spark.ml.feature.Binarizer
  2. val data = Array((0, 0.1), (1, 0.8), (2, 0.2))
  3. val dataFrame = spark.createDataFrame(data).toDF("label", "feature")
  4. val binarizer: Binarizer = new Binarizer()
  5. .setInputCol("feature")
  6. .setOutputCol("binarized_feature")
  7. .setThreshold(0.5)
  8. val binarizedDataFrame = binarizer.transform(dataFrame)
  9. val binarizedFeatures = binarizedDataFrame.select("binarized_feature")
  10. binarizedFeatures.collect().foreach(println)