Description

Naive Bayes Classifier.

We support the multinomial Naive Bayes and multinomial Naive Bayes model, a probabilistic learning method. Here, feature values of train table must be nonnegative.

Details info of the algorithm: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Parameters

Name Description Type Required? Default Value
modelType model type : Multinomial or Bernoulli. String “Multinomial”
featureCols Names of the feature columns used for training in the input table String[] null
labelCol Name of the label column in the input table String
weightCol Name of the column indicating weight String null
vectorCol Name of a vector column String null
smoothing the smoothing factor Double 1.0
vectorCol Name of a vector column String null
predictionCol Column name of prediction. String
predictionDetailCol Column name of prediction result, it will include detailed info. String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Script

  1. data = np.array([
  2. [1.0, 1.0, 0.0, 1.0, 1],
  3. [1.0, 0.0, 1.0, 1.0, 1],
  4. [1.0, 0.0, 1.0, 1.0, 1],
  5. [0.0, 1.0, 1.0, 0.0, 0],
  6. [0.0, 1.0, 1.0, 0.0, 0],
  7. [0.0, 1.0, 1.0, 0.0, 0],
  8. [0.0, 1.0, 1.0, 0.0, 0],
  9. [1.0, 1.0, 1.0, 1.0, 1],
  10. [0.0, 1.0, 1.0, 0.0, 0]])
  11. df = pd.DataFrame({"f0": data[:, 0],
  12. "f1": data[:, 1],
  13. "f2": data[:, 2],
  14. "f3": data[:, 3],
  15. "label": data[:, 4]})
  16. df["label"] = df["label"].astype('int')
  17. batchData = dataframeToOperator(df, schemaStr='f0 double, f1 double, f2 double, f3 double, label int', op_type='batch')
  18. # load data
  19. colnames = ["f0","f1","f2", "f3"]
  20. ns = NaiveBayes().setFeatureCols(colnames).setLabelCol("label").setPredictionCol("pred")
  21. model = ns.fit(batchData)
  22. model.transform(batchData).print()

Result

f0 f1 f2 f3 label pred
1.0 1.0 0.0 1.0 1 1
1.0 0.0 1.0 1.0 1 1
1.0 0.0 1.0 1.0 1 1
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
1.0 1.0 1.0 1.0 1 1
0.0 1.0 1.0 0.0 0 0