~/blog/week-4-prep.mdx
$cat~/blog/week-4-prep.mdx
Week 4 Prep: Classification & Decision Trees
February 2, 20252min
Machine Learning & Classification
- Machine learning is a field of computer science focused on enabling computers to learn patterns and make decisions based on data, rather than following specific preprogrammed instructions.
- Classification is a task that involves using machine learning algorithms to assign a class label to examples from a problem domain. A common example is marking emails as
spamornot-spambased on their content.
Machine Learning Steps
- Data Collection - involves gathering raw data from sources like sensors, databases, user-generated content, or analytics.
- Data Preprocessing - involves cleaning and preparing the raw data, such as handling missing values, normalizing data ranges, and converting data to correct formats.
- Data Splitting - involves dividing the dataset into subsets, typically training, validation, and testing tests.
- Model Selection - involves choosing the most appropriate algorithm, like decision trees, logistic regression, or other classifiers.
- Model Training - involves feeding training data into the selected model so it can learn the patterns and relationships.
- Model Evaluation - involves testing the model on validation or test sets to assess its performance using different metrics.
- Model Tuning - involves adjusting the model’s hyperparameters to improve its performance.
Evaluating a Classification Model
Some common metrics for evaluating classification models include:
- Confusion Matrix - displays the performance of a classification model by displaying a matrix of counts of true positives, true negatives, false positives, and false negatives.
- F1 Score - the harmonic mean of precision and recall. Precision is the ratio of correctly predicted positive observations to the total predicted positives, while recall is the ratio of correctly predicted positive observations to all actual positives.
- Receiver Operating Characteristic Curve (ROC Curve) - displays a graph of the true positive rate (sensitivity) against the false positive rate at different thresholds.
Examples of Classification Algorithms
- K-Nearest Neighbors (KNN) - KNN is a simple, instance-based learning algorithm where the class of a new data point is determined by the majority class among its ‘k’ closest points in the training data.
- Decision Tree - A decision tree is a flowchart-like model that makes decisions by splitting data into branches based on feature values, with each branch representing a decision rule.
Further Reading
- Machine Learning:
Machine Learning - Classification:
Statistical Classification - Data Collection:
Data Collection - Data Preprocessing:
Data Pre-processing - Model Selection:
Model Selection - Model Training:
Supervised Learning - Model Tuning:
Hyperparameter Optimization - Confusion Matrix:
Confusion Matrix - F1 Score:
F1 Score - ROC Curve:
Receiver Operating Characteristic - Decision Trees:
Decision Tree - K-Nearest Neighbors (KNN):
K-Nearest Neighbors Algorithm
// EOF
suggested reads
01Week 6 Prep: Classification2min
In this blog post we will discuss more classification algorithms.
02Week 5 Prep: Decision Trees & Project 2 Intro1min
In this blog post we will discuss decision trees in more detail.
03Week 11 Prep: Project 4 Intro2min
In this blog post, we will choose a problem to solve using clustering for Project 4.