Intro to Data Mining: Project 3
March 30, 2025
Predicting song popularity using regression techniques with feature engineering and model optimization.
Project Overview
This project explores regression analysis by predicting song popularity based on audio features. It demonstrates how to build predictive models for continuous target variables.
Problem Statement
Using audio features from Spotify's API (such as danceability, energy, loudness, tempo, valence), predict a song's popularity score. This is a practical application of regression in the music industry.
Dataset Features
Audio Characteristics
- Danceability: How suitable a track is for dancing
- Energy: Intensity and activity measure
- Loudness: Overall volume in decibels
- Speechiness: Presence of spoken words
- Acousticness: Confidence measure of acoustic sound
- Instrumentalness: Predicts whether a track contains vocals
- Tempo: Beats per minute (BPM)
- Valence: Musical positiveness
Methodology
Feature Engineering
💡
Creating meaningful features is often more important than choosing the right algorithm.
- Polynomial features for capturing non-linear relationships
- Interaction terms between related features
- Feature scaling and normalization
- Handling outliers and skewed distributions
Models Implemented
- Linear Regression: Baseline model
- Ridge Regression: L2 regularization
- Lasso Regression: L1 regularization with feature selection
- Polynomial Regression: Capturing non-linear patterns
- Random Forest Regressor: Ensemble approach
Evaluation Metrics
- Mean Squared Error (MSE): Average squared prediction error
- Root Mean Squared Error (RMSE): Interpretable error metric
- Mean Absolute Error (MAE): Average absolute prediction error
- R² Score: Proportion of variance explained
Results & Analysis
The project includes:
- Comparison of model performance
- Feature importance analysis
- Residual plots and diagnostics
- Predictions vs. actual values visualization
- Discussion of model limitations
Key Insights
- Which audio features most influence popularity
- Trade-offs between model complexity and interpretability
- Impact of regularization on model performance
- Importance of proper validation strategies
Applications
Regression techniques are used in:
- Price prediction (real estate, stocks)
- Sales forecasting
- Risk assessment
- Demand prediction
- Performance optimization