Naive Bayes

Naive Bayes is a family of probabilistic algorithms based on Bayes' Theorem, used for classification tasks in machine learning and statistics. It assumes independence among the features given the class label, which is why it is termed "naive." Despite this assumption, Naive Bayes classifiers often perform well in practice, especially for text classification tasks like spam detection and sentiment analysis.

Bayes' Theorem

Bayes' Theorem describes the probability of occurrence of an event related to any condition. Mathematically:

$$ P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)} $$
  • \( P(C|X) \): Posterior probability of class \( C \) given feature set \( X \).
  • \( P(X|C) \): Likelihood of feature set \( X \) given class \( C \).
  • \( P(C) \): Prior probability of class \( C \).
  • \( P(X) \): Marginal probability of feature set \( X \).

Naive Bayes Classifier

The assumption of feature independence simplifies the likelihood calculation:

$$ P(X|C) = P(x_1 | C) \cdot P(x_2 | C) \cdots P(x_n | C) $$ $$ P(C|X) \propto P(C) \cdot P(x_1 | C) \cdot P(x_2 | C) \cdots P(x_n | C) $$

To classify a new observation, compute \( P(C|X) \) for each class and assign the class with the highest probability.

Types of Naive Bayes

  • Gaussian Naive Bayes: Assumes features follow a normal (Gaussian) distribution. Useful for continuous data.
  • Multinomial Naive Bayes: Suitable for text data where features are word frequencies.
  • Bernoulli Naive Bayes: Works with binary/boolean data, indicating the presence or absence of features.

Code Example

# Import libraries
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets

# Load dataset
dataset = datasets.load_iris()
X, y = dataset.data, dataset.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Create and train the model
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Evaluate accuracy
y_predict = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)
Accuracy: 0.98

To classify a new instance:

result = classifier.predict([[5, 2, 1, 4]])
print(dataset.target_names[result])
['virginica']

Advantages and Limitations

Advantages Limitations
Simplicity: Easy to implement and understand. Independence Assumption: Assumes features are independent, which may not hold in practice.
Efficiency: Handles large datasets effectively. Zero Probability Problem: Assigns zero probability to unseen features; can be mitigated with Laplace smoothing.
Works Well with High-Dimensional Data: Ideal for text classification tasks.
Handles Missing Data: Performs well even with missing feature values.