Skip to main content
[░░░░░░░░░░░░░░░░░░░░]0% — 2 min left
~/blog/week-6-prep.mdx
$cat~/blog/week-6-prep.mdx

Week 6 Prep: Classification

February 16, 20252min

High-Level Overview

  1. Naive Bayes
    • A probabilistic classifier based on Bayes’ Theorem
    • Assumes features are independent (naive)
    • Works well for text classification (i.e. spam detection, sentiment analysis)
    • Fast and efficient, even with larger datasets
  2. K-Nearest Neighbors (KNN)
    • A non-parametric, instance-based learning algorithm
    • Classifies a data point based on the majority class of its k-nearest neighbors
    • No training phase, all computation happens at prediction time
    • Very simple but computationally expensive for larger datasets
  3. Support Vector Machine (SVM)
    • Finds the optimal hyperplane that maximizes the margin between classes
    • Can handle non-linear data using the kernel trick
    • Works well for high-dimensional spaces
    • Computationally expensive for larger datasets
  4. Random Forest
    • An ensemble method that combines multiple decision trees
    • Reduces overfitting compared to a single decision tree
    • Works well with both categorical and numeric data
    • Less interpretable compared to individual decision trees

Deeper Explanation of Naive Bayes

Bayes Theorem

Naive Bayes is based on Bayes’ Theorem, which states:

P(AB)=P(BA)×P(A)P(B)P(A|B)=\frac{P(B|A)\times P(A)}{P(B)}

Where:

  • P(AB)P(A|B) is the posterior probability (i.e. probability of class AA given feature BB)
  • P(BA)P(B|A) is the likelihood (i.e. probability of feature BB given class AA)
  • P(A)P(A) is the prior probability (i.e. probability of class AA occurring)
  • P(B)P(B) is the evidence (i.e. probability of feature BB occurring)

Classification Using Naive Bayes

For a given input with multiple features X=(x1,x2,...,xn)X=(x_1,x_2,...,x_n), the probability of it belonging to class CkC_k is:

P(CkX)=P(XCk)×P(Ck)P(X)P(C_k|X)=\frac{P(X|C_k)\times P(C_k)}{P(X)}

Using the naive assumption that features are conditionally independent, the likelihood simplifies to:

P(XCk)=P(x1Ck)P(x2Ck)...P(xnCk)P(X|C_k)=P(x_1|C_k)P(x_2|C_k)...P(x_n|C_k)

To classify, we choose the class CkC_k that maximizes:

P(Ck)i=1nP(xiCk)P(C_k)\prod_{i=1}^{n}P(x_i|C_k)

Pros and Cons of Different Classification Models

AlgorithmProsConsBest Use Cases
Naive BayesFast, works well with high-dimensional data, handles missing valuesAssumes independence of features, not good for complex decision boundariesText classification, spam filtering
KNNSimple, no training phase, works well with non-linear dataSlow for large datasets, memory-intensive, sensitive to irrelevant featuresSmall datasets, recommendation systems
SVMHandles high-dimensional data well, effective for complex classification tasksComputationally expensive, hard to interpretImage classification, bioinformatics
Random ForestReduces overfitting, handles mixed data types, robustLess interpretable, slower trainingGeneral-purpose classification, fraud detection