~/blog/week-6-prep.mdx

Week 6 Prep: Classification

February 16, 20252min

High-Level Overview

Naive Bayes
- A probabilistic classifier based on Bayes’ Theorem
- Assumes features are independent (naive)
- Works well for text classification (i.e. spam detection, sentiment analysis)
- Fast and efficient, even with larger datasets
K-Nearest Neighbors (KNN)
- A non-parametric, instance-based learning algorithm
- Classifies a data point based on the majority class of its k-nearest neighbors
- No training phase, all computation happens at prediction time
- Very simple but computationally expensive for larger datasets
Support Vector Machine (SVM)
- Finds the optimal hyperplane that maximizes the margin between classes
- Can handle non-linear data using the kernel trick
- Works well for high-dimensional spaces
- Computationally expensive for larger datasets
Random Forest
- An ensemble method that combines multiple decision trees
- Reduces overfitting compared to a single decision tree
- Works well with both categorical and numeric data
- Less interpretable compared to individual decision trees

Naive Bayes is based on Bayes’ Theorem, which states:

P(A|B)=\frac{P(B|A)\times P(A)}{P(B)}

Where:

$P(A|B)$ is the posterior probability (i.e. probability of class $A$ given feature $B$ )
$P(B|A)$ is the likelihood (i.e. probability of feature $B$ given class $A$ )
$P(A)$ is the prior probability (i.e. probability of class $A$ occurring)
$P(B)$ is the evidence (i.e. probability of feature $B$ occurring)

For a given input with multiple features $X=(x_1,x_2,...,x_n)$ , the probability of it belonging to class $C_k$ is:

P(C_k|X)=\frac{P(X|C_k)\times P(C_k)}{P(X)}

Using the naive assumption that features are conditionally independent, the likelihood simplifies to:

P(X|C_k)=P(x_1|C_k)P(x_2|C_k)...P(x_n|C_k)

To classify, we choose the class $C_k$ that maximizes:

P(C_k)\prod_{i=1}^{n}P(x_i|C_k)

Algorithm	Pros	Cons	Best Use Cases
Naive Bayes	Fast, works well with high-dimensional data, handles missing values	Assumes independence of features, not good for complex decision boundaries	Text classification, spam filtering
KNN	Simple, no training phase, works well with non-linear data	Slow for large datasets, memory-intensive, sensitive to irrelevant features	Small datasets, recommendation systems
SVM	Handles high-dimensional data well, effective for complex classification tasks	Computationally expensive, hard to interpret	Image classification, bioinformatics
Random Forest	Reduces overfitting, handles mixed data types, robust	Less interpretable, slower training	General-purpose classification, fraud detection

// EOF