Description

Naive Bayes Text Predictor.

We support the multinomial Naive Bayes Text and multinomial Naive Bayes Text model, a probabilistic learning method. here, feature values of train table must be nonnegative.

Details info of the algorithm: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Parameters

Name Description Type Required? Default Value
vectorCol Name of a vector column String
predictionCol Column name of prediction. String
predictionDetailCol Column name of prediction result, it will include detailed info. String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Script

  1. data = np.array([
  2. ["$31$0:1.0 1:1.0 2:1.0 30:1.0","1.0 1.0 1.0 1.0", '1'],
  3. ["$31$0:1.0 1:1.0 2:0.0 30:1.0","1.0 1.0 0.0 1.0", '1'],
  4. ["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
  5. ["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
  6. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
  7. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
  8. ["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0']])
  9. dataSchema = ["sv", "dv", "label"]
  10. df = pd.DataFrame({"sv": data[:, 0], "dv": data[:, 1], "label": data[:, 2]})
  11. batchData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='batch')
  12. streamData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='stream')
  13. ns = NaiveBayesTextTrainBatchOp().setVectorCol("sv").setLabelCol("label")
  14. model = batchData.link(ns)
  15. predictor = NaiveBayesTextPredictStreamOp(model).setVectorCol("sv").setReservedCols(["sv", "label"]).setPredictionCol("pred")
  16. predictor.linkFrom(streamData).print()
  17. StreamOperator.execute()

运行结果

sv label pred
“$31$0:1.0 1:1.0 2:1.0 30:1.0” 1 1
“$31$0:1.0 1:1.0 2:0.0 30:1.0” 1 1
“$31$0:1.0 1:0.0 2:1.0 30:1.0” 1 1
“$31$0:1.0 1:0.0 2:1.0 30:1.0” 1 1
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0
“$31$0:0.0 1:1.0 2:1.0 30:0.0” 0 0