Predicting the popularity of music tracks on Spotify based on numerical metrics
DOI:
https://doi.org/10.34185/1562-9945-6-155-2024-07Keywords:
intelligent data analysis, classification, KNeighbors, Decision Tree, Random Forest, Extreme Gradient Boosting.Abstract
In today's world, music plays an important role in the lives of millions of people, and music streaming platforms such as Spotify have become an integral part of modern culture. The popularity of music tracks is of great importance to the music industry, affecting artists' incomes and trends in the music world. Predicting the popularity of music tracks is an impor-tant task that can help artists, producers, and platforms better understand listener preferences and optimize their strategies. As part of this work, a data storage of music tracks on the Spotify platform has been de-veloped, based on a physical model of the database, the functionality of which is implemented using SQL scripts. Working with the database is presented through the implementation of software for the implementation of ETL processes and intelligent analysis of selected data. The software allows you to classify tracks by the level of popularity (0 - not at all popular, 1 - medium popularity, 2 - hit) using numerical track metrics such as acousticness, tempo, va-lence, liveness, etc. The role of the data storage management system is SQLite, the program-ming language for implementing the application is Python. Different machine learning models are used to predict track popularity, including KNeighbors, Decision Tree, Random Forest, and Extreme Gradient Boosting. Data mining software provides efficient track classification and graphical display, allowing users to easily interpret forecasting results. Libraries used in the work: pandas, numpy, seaborn, matplotlib, tabulate, xgboost, scipy, sqlite3. The overall analysis showed that the XGBoost and Random Forest models are the most effective for predicting the popularity of music tracks. They demonstrate high accuracy and resistance to changes in the set of attributes, which makes them suitable for use in real condi-tions.
References
SQLite. SQLite Home Page. URL: https://www.sqlite.org/index.html (date of access: 26.05.2024).
Python. The official home page of the Python Programming Language. URL: https://www.python.org/ (date of access: 26.05.2024).
DecisionTree Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html (date of access: 26.05.2024).
KNeighbors Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html (date of ac-cess: 26.05.2024).
XGBoost Documentation. XGBoost Documentation. URL: https://xgboost.readthedocs.io/en/stable/index.html (date of access: 25.05.2024).
Tabulate Documentation. URL: https://pyneng.readthedocs.io/en/latest/book/12_useful_modules/tabulate.html (date of access: 26.05.2024).
RandomForest Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (date of access: 26.05.2024).
Downloads
Published
Issue
Section
License
Copyright (c) 2025 System technologies

This work is licensed under a Creative Commons Attribution 4.0 International License.