I built a machine learning model to answer a simple question:
Can you evaluate a song’s commercial potential before release?
Every day, independent artists upload thousands of tracks to streaming platforms without any objective feedback.
Marketing budgets get spent on songs that struggle to gain traction, while some strong records never get the attention they deserve.
I wanted to test whether data can reduce that uncertainty.
So, I built Hit Predictor - a predictive model that translates audio features into a structured signal. (
huggingface.co/spaces/Prifea…)
This model is a Random Forest Classifier trained on Spotify data.
The dataset originally contained over 114,000 tracks.
But during EDA, I removed approximately 1,000 sleep tracks, representing 0.88% of the data, because they consist of ambient sounds like white noise and ocean waves and do not reflect musical structure.
This reduced the dataset to 112,999 tracks.
While exploring the data, a conversation with a friend led to an important observation.
The same song appeared multiple times.
And what looked like a data quality issue turned out to be structural.
Songs are indexed by genre, so a single track can appear across multiple genre labels.
Validation showed 32,601 duplicate track–artist combinations (29% of the dataset).
This creates two realities.
➛ For analysis, duplicates enable genre-level comparisons.
➛ For modelling, they introduce bias and risk of overfitting.
So I split the approach.
I kept the full dataset for exploration, and created a deduplicated version for training by retaining one instance per track–artist combination.
After deduplication, the model was trained on 80,398 unique tracks.
The model uses 10 key audio features, including danceability, energy, loudness, tempo, and speechiness, to classify songs into four tiers:
Viral ➛Hit ➛Mid ➛Flop
After hyperparameter tuning and resolving deployment pipeline issues, the model achieved 69% accuracy.
This is not about perfectly predicting hits, but more about introducing a data-driven checkpoint before decisions are made.
A few things stood out during the process.
➛ Genre remains the strongest predictor of performance based on feature importance.
➛Audio features alone are not enough. Artist reach, marketing, timing, and distribution all play a significant role in how a track performs.
So the model works best as a decision-support tool, not a replacement for human judgment.
From a build perspective, this was an end-to-end pipeline.
⭐︎ EDA and data hygiene using Pandas and NumPy.
⭐︎ Visualization and feature analysis with Matplotlib and Seaborn.
⭐︎ Feature engineering and scaling with Scikit-learn.
⭐︎ Model training and tuning using Random Forest classification.
⭐︎ Backend and API development with Flask.
⭐︎ Deployment on Hugging Face Spaces, including handling large files and environment setup.
You can input a track’s features and get a prediction with probability scores.
If you are an artist, manager, or just curious about how your favorite songs might perform, I have shared the link in the first comment.
I would be interested to see what results you get.