Description

Naive Bayes Text Classifier.

We support the multinomial Naive Bayes Text and multinomial Naive Bayes Text model, a probabilistic learning method. Here, feature values of train table must be nonnegative.

Parameters

Name Description Type Required? Default Value
modelType model type : Multinomial or Bernoulli. String “Multinomial”
labelCol Name of the label column in the input table String
weightCol Name of the column indicating weight String null
vectorCol Name of a vector column String
smoothing the smoothing factor Double 1.0

Script Example

Script

  1. data = np.array([
  2. ["$31$0:1.0 1:1.0 2:1.0 30:1.0","1.0 1.0 1.0 1.0", '1'],
  3. ["$31$0:1.0 1:1.0 2:0.0 30:1.0","1.0 1.0 0.0 1.0", '1'],
  4. ["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
  5. ["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
  6. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
  7. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
  8. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0']])
  9. dataSchema = ["sv", "dv", "label"]
  10. df = pd.DataFrame({"sv": data[:, 0], "dv": data[:, 1], "label": data[:, 2]})
  11. batchData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='batch')
  12. ns = NaiveBayesTextTrainBatchOp().setVectorCol("sv").setLabelCol("label")
  13. model = batchData.link(ns)
  14. predictor = NaiveBayesTextPredictBatchOp().setVectorCol("sv").setReservedCols(["sv", "label"]).setPredictionCol("pred")
  15. predictor.linkFrom(model, batchData).print()

运行结果

sv label pred
“$31$0:1.0 1:1.0 2:1.0 30:1.0” 1 1
“$31$0:1.0 1:1.0 2:0.0 30:1.0” 1 1
“$31$0:1.0 1:0.0 2:1.0 30:1.0” 1 1
“$31$0:1.0 1:0.0 2:1.0 30:1.0” 1 1
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0