Predicting the popularity of music tracks on Spotify based on numerical metrics

Bur A.O.; Likhouzova T.A.; Oliinyk Y.O.

doi:10.34185/1562-9945-6-155-2024-07

Authors

Bur A.O.
Likhouzova T.A.
Oliinyk Y.O.

DOI:

https://doi.org/10.34185/1562-9945-6-155-2024-07

Keywords:

intelligent data analysis, classification, KNeighbors, Decision Tree, Random Forest, Extreme Gradient Boosting.

Abstract

In today's world, music plays an important role in the lives of millions of people, and music streaming platforms such as Spotify have become an integral part of modern culture. The popularity of music tracks is of great importance to the music industry, affecting artists' incomes and trends in the music world. Predicting the popularity of music tracks is an impor-tant task that can help artists, producers, and platforms better understand listener preferences and optimize their strategies. As part of this work, a data storage of music tracks on the Spotify platform has been de-veloped, based on a physical model of the database, the functionality of which is implemented using SQL scripts. Working with the database is presented through the implementation of software for the implementation of ETL processes and intelligent analysis of selected data. The software allows you to classify tracks by the level of popularity (0 - not at all popular, 1 - medium popularity, 2 - hit) using numerical track metrics such as acousticness, tempo, va-lence, liveness, etc. The role of the data storage management system is SQLite, the program-ming language for implementing the application is Python. Different machine learning models are used to predict track popularity, including KNeighbors, Decision Tree, Random Forest, and Extreme Gradient Boosting. Data mining software provides efficient track classification and graphical display, allowing users to easily interpret forecasting results. Libraries used in the work: pandas, numpy, seaborn, matplotlib, tabulate, xgboost, scipy, sqlite3. The overall analysis showed that the XGBoost and Random Forest models are the most effective for predicting the popularity of music tracks. They demonstrate high accuracy and resistance to changes in the set of attributes, which makes them suitable for use in real condi-tions.

References

SQLite. SQLite Home Page. URL: https://www.sqlite.org/index.html (date of access: 26.05.2024).

Python. The official home page of the Python Programming Language. URL: https://www.python.org/ (date of access: 26.05.2024).

DecisionTree Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html (date of access: 26.05.2024).

KNeighbors Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html (date of ac-cess: 26.05.2024).

XGBoost Documentation. XGBoost Documentation. URL: https://xgboost.readthedocs.io/en/stable/index.html (date of access: 25.05.2024).

Tabulate Documentation. URL: https://pyneng.readthedocs.io/en/latest/book/12_useful_modules/tabulate.html (date of access: 26.05.2024).

RandomForest Classifier. scikit-learn. URL: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (date of access: 26.05.2024).

Predicting the popularity of music tracks on Spotify based on numerical metrics

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

languages

ouci

crossref

scholar

worldcat

ISSN

bpnu

vernadskiy

copernicus

ulrichs_web

ukrainika

DNTB

Latest publications

Language

© 2025 System technologies. All Rights Reserved.