python-course.eu

25. Naive Bayes Classifier with Scikit

By Bernd Klein. Last modified: 17 Feb 2022.

We have written Naive Bayes Classifiers from scratch in our previous chapter of our tutorial. In this part of the tutorial on Machine Learning with Python, we want to show you how to use ready-made classifiers. The module Scikit provides naive Bayes classifiers "off the rack".

off the rack classifiers

Our first example uses the "iris dataset" contained in the model to train and test the classifier

# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
# load the iris datasets
dataset = datasets.load_iris()
# fit a Naive Bayes model to the data
model = GaussianNB()

model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

OUTPUT:

GaussianNB()
             precision    recall  f1-score   support

          0       1.00      1.00      1.00        50
          1       0.94      0.94      0.94        50
          2       0.94      0.94      0.94        50

avg / total       0.96      0.96      0.96       150

[[50  0  0]
 [ 0 47  3]
 [ 0  3 47]]

We use our person data from the previous chapter of our tutorial to train another classifier in the next example:

import numpy as np

def prepare_person_dataset(fname):
    genders = ["male", "female"]
    persons = []
    with open(fname) as fh:
        for line in fh:
            persons.append(line.strip().split())

    firstnames = []
    dataset = []  # weight and height


    for person in persons:
        firstnames.append( (person[0], person[4]) )
        height_weight = (float(person[2]), float(person[3]))
        dataset.append( (height_weight, person[4]))
    return dataset

learnset = prepare_person_dataset("data/person_data.txt")
testset = prepare_person_dataset("data/person_data_testset.txt")
print(learnset)

OUTPUT:

[((184.0, 73.0), 'male'), ((149.0, 52.0), 'female'), ((174.0, 63.0), 'female'), ((175.0, 67.0), 'male'), ((183.0, 81.0), 'female'), ((187.0, 60.0), 'male'), ((192.0, 96.0), 'male'), ((204.0, 91.0), 'male'), ((180.0, 66.0), 'male'), ((184.0, 52.0), 'male'), ((174.0, 53.0), 'male'), ((177.0, 91.0), 'male'), ((138.0, 37.0), 'female'), ((200.0, 82.0), 'male'), ((193.0, 79.0), 'male'), ((189.0, 79.0), 'male'), ((145.0, 59.0), 'female'), ((188.0, 53.0), 'male'), ((187.0, 81.0), 'male'), ((187.0, 99.0), 'male'), ((190.0, 81.0), 'male'), ((161.0, 48.0), 'female'), ((179.0, 75.0), 'female'), ((180.0, 67.0), 'male'), ((155.0, 48.0), 'male'), ((201.0, 122.0), 'male'), ((162.0, 62.0), 'female'), ((148.0, 49.0), 'female'), ((171.0, 50.0), 'male'), ((196.0, 86.0), 'female'), ((163.0, 46.0), 'female'), ((159.0, 37.0), 'female'), ((163.0, 53.0), 'male'), ((150.0, 39.0), 'female'), ((170.0, 56.0), 'female'), ((191.0, 55.0), 'male'), ((175.0, 37.0), 'male'), ((169.0, 78.0), 'female'), ((167.0, 59.0), 'female'), ((170.0, 78.0), 'male'), ((178.0, 79.0), 'male'), ((168.0, 71.0), 'female'), ((170.0, 37.0), 'female'), ((167.0, 58.0), 'female'), ((152.0, 43.0), 'female'), ((191.0, 81.0), 'male'), ((155.0, 48.0), 'female'), ((176.0, 61.0), 'male'), ((151.0, 41.0), 'female'), ((166.0, 59.0), 'female'), ((168.0, 46.0), 'male'), ((165.0, 65.0), 'female'), ((169.0, 67.0), 'male'), ((158.0, 43.0), 'female'), ((173.0, 61.0), 'male'), ((180.0, 74.0), 'male'), ((212.0, 59.0), 'male'), ((152.0, 62.0), 'female'), ((189.0, 67.0), 'male'), ((159.0, 56.0), 'female'), ((163.0, 58.0), 'female'), ((174.0, 45.0), 'female'), ((174.0, 69.0), 'male'), ((167.0, 47.0), 'male'), ((131.0, 37.0), 'female'), ((154.0, 74.0), 'female'), ((159.0, 59.0), 'female'), ((159.0, 58.0), 'female'), ((177.0, 83.0), 'female'), ((193.0, 96.0), 'male'), ((180.0, 83.0), 'female'), ((164.0, 54.0), 'male'), ((164.0, 64.0), 'female'), ((171.0, 52.0), 'male'), ((163.0, 41.0), 'female'), ((165.0, 30.0), 'male'), ((161.0, 61.0), 'female'), ((198.0, 75.0), 'male'), ((183.0, 70.0), 'female'), ((185.0, 71.0), 'male'), ((175.0, 58.0), 'male'), ((195.0, 89.0), 'male'), ((170.0, 66.0), 'female'), ((167.0, 61.0), 'female'), ((166.0, 65.0), 'female'), ((180.0, 88.0), 'female'), ((164.0, 55.0), 'male'), ((161.0, 53.0), 'female'), ((187.0, 76.0), 'male'), ((170.0, 63.0), 'female'), ((192.0, 101.0), 'male'), ((175.0, 56.0), 'male'), ((190.0, 100.0), 'male'), ((164.0, 63.0), 'male'), ((172.0, 61.0), 'female'), ((168.0, 69.0), 'female'), ((156.0, 51.0), 'female'), ((167.0, 40.0), 'female'), ((161.0, 18.0), 'male'), ((167.0, 56.0), 'female')]
# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
#print(dataset.data, dataset.target)
w, l = zip(*learnset)
w = np.array(w)
l = np.array(l)


model.fit(w, l)
print(model)

w, l = zip(*testset)
w = np.array(w)
l = np.array(l)
predicted = model.predict(w)
print(predicted)
print(l)
# summarize the fit of the model
print(metrics.classification_report(l, predicted))
print(metrics.confusion_matrix(l, predicted))

OUTPUT:

GaussianNB()
['female' 'male' 'male' 'female' 'female' 'male' 'female' 'female' 'female'
 'female' 'female' 'female' 'female' 'female' 'male' 'female' 'male'
 'female' 'female' 'female' 'male' 'female' 'female' 'male' 'male' 'female'
 'female' 'male' 'male' 'male' 'female' 'female' 'male' 'male' 'male'
 'female' 'female' 'male' 'female' 'male' 'male' 'female' 'female' 'male'
 'female' 'male' 'male' 'female' 'male' 'female' 'female' 'female' 'male'
 'female' 'female' 'male' 'female' 'female' 'male' 'female' 'female' 'male'
 'female' 'female' 'female' 'female' 'male' 'female' 'female' 'female'
 'female' 'female' 'male' 'male' 'female' 'female' 'male' 'male' 'female'
 'female' 'male' 'male' 'female' 'male' 'male' 'male' 'female' 'male'
 'female' 'female' 'male' 'male' 'female' 'male' 'female' 'female' 'female'
 'male' 'female' 'male']
['female' 'male' 'male' 'female' 'female' 'male' 'male' 'male' 'female'
 'female' 'female' 'female' 'female' 'female' 'male' 'male' 'male' 'female'
 'female' 'female' 'male' 'female' 'female' 'male' 'male' 'female' 'male'
 'female' 'male' 'female' 'male' 'male' 'male' 'male' 'female' 'female'
 'female' 'male' 'male' 'female' 'male' 'female' 'male' 'male' 'female'
 'male' 'female' 'male' 'female' 'female' 'female' 'male' 'male' 'male'
 'male' 'male' 'female' 'male' 'male' 'female' 'female' 'female' 'male'
 'female' 'male' 'female' 'male' 'female' 'male' 'female' 'female' 'female'
 'male' 'male' 'male' 'female' 'male' 'male' 'female' 'female' 'male'
 'male' 'female' 'female' 'male' 'male' 'female' 'male' 'female' 'male'
 'male' 'female' 'female' 'male' 'male' 'female' 'female' 'male' 'female'
 'female']
             precision    recall  f1-score   support

     female       0.68      0.80      0.73        50
       male       0.76      0.62      0.68        50

avg / total       0.72      0.71      0.71       100

[[40 10]
 [19 31]]

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See: Live Python courses overview

Upcoming online Courses

Python Text Processing Course

17 Oct 2022 to 21 Oct 2022

Enrol here