Description
Naive Bayes Text Classifier.
We support the multinomial Naive Bayes Text and multinomial Naive Bayes Text model, a probabilistic learning method. Here, feature values of train table must be nonnegative.
Details info of the algorithm: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
modelType | model type : Multinomial or Bernoulli. | String | “Multinomial” | |
labelCol | Name of the label column in the input table | String | ✓ | |
weightCol | Name of the column indicating weight | String | null | |
vectorCol | Name of a vector column | String | ✓ | |
smoothing | the smoothing factor | Double | 1.0 | |
vectorCol | Name of a vector column | String | ✓ | |
predictionCol | Column name of prediction. | String | ✓ | |
predictionDetailCol | Column name of prediction result, it will include detailed info. | String | ||
reservedCols | Names of the columns to be retained in the output table | String[] | null |
Script Example
Script
data = np.array([
["$31$0:1.0 1:1.0 2:1.0 30:1.0","1.0 1.0 1.0 1.0", '1'],
["$31$0:1.0 1:1.0 2:0.0 30:1.0","1.0 1.0 0.0 1.0", '1'],
["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0']])
dataSchema = ["sv", "dv", "label"]
df = pd.DataFrame({"sv": data[:, 0], "dv": data[:, 1], "label": data[:, 2]})
batchData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='batch')
model = NaiveBayesTextClassifier().setVectorCol("sv").setLabelCol("label").setReservedCols(["sv", "label"]).setPredictionCol("pred")
model.fit(batchData).transform(batchData).print()
运行结果
sv | label | pred |
---|---|---|
“$31$0:1.0 1:1.0 2:1.0 30:1.0” | 1 | 1 |
“$31$0:1.0 1:1.0 2:0.0 30:1.0” | 1 | 1 |
“$31$0:1.0 1:0.0 2:1.0 30:1.0” | 1 | 1 |
“$31$0:1.0 1:0.0 2:1.0 30:1.0” | 1 | 1 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” | 0 | 0 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” | 0 | 0 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” | 0 | 0 |