Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. Bayes’ theorem states the following relationship,

Despite their seeming oversimplified assumptions, naive Bayes classifiers have performed well in a variety of real-world applications, most notably document categorization and spam filtering. They require only a modest quantity of training data to predict the required parameters. When compared to more advanced algorithms, Naive Bayes learners and classifiers may be exceedingly quick. Because the class conditional feature distributions are decoupled, each distribution may be estimated individually as a one-dimensional distribution. This, in turn, aids in the alleviation of issues caused by the curse of dimensionality.

  • from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.naive_bayes import GaussianNB
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
    gnb = GaussianNB()
    y_pred =, y_train).predict(X_test)
    print("Number of mislabeled points out of a total %d points : %d"
         % (X_test.shape[0], (y_test != y_pred).sum()))


  • from sklearn.naive_bayes import GaussianNB

Full Code of Naive Bayes Model Exercise

See the full code below for Naive Bayes modelling (Download dataset here):