Intelligent audio file classification system

Authors

DOI:

https://doi.org/10.34185/1562-9945-5-162-2026-24

Keywords:

artificial intelligence, neural networks, machine learning, intelligent system, solving classification task, recognition quality metrics, model parameters optimization, ensemble voting methods, audio features, musical genres

Abstract

The continuously increasing volume of music compositions highlights the need for effec-tive organization to ensure convenient user access. Genre classification is one of the common approaches, enabling listeners to select tracks according to their individual preferences and receive automated recommendations for new content. The literature review of previous stud-ies demonstrated that different types of artificial neural network architectures are used for classification, utilizing various audio features and their combinations for training. Using mel-frequency cepstral coefficients (MFCCs) as input data revealed the advantages of convolu-tional neural networks (CNN) over multilayer perceptrons (MLP) and recurrent neural net-works (RNN). Evaluating activation functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU on MLP, RNN and CNN showed that Leaky ReLU achieved the best performance due to its ability to retain scaled negative gradients, unlike ReLU, which just zeros them out. A com-bined CNN+RNN architecture provided great results using MFCCs and spectrograms as in-put data. This study aims to analyze the effectiveness of combining various types of audio fea-tures for genre classification using a multilayer perceptron, as well as to explore methods for improving classification accuracy. Time-domain and frequency-domain audio features were examined. The neural network was trained and tested on the GTZAN dataset, containing 1000 audio samples with duration of 30 seconds, 100 samples for each of 10 genres. The hyper-parameters were automatically optimized using the Optuna framework, which applies intelli-gent search based on Tree-structured Parzen Estimator. Furthermore, a post-processing mechanism based on hard voting, soft voting, and the Borda count method was introduced. Experimental results demonstrate that the proposed ensemble approach significantly en-hances the classification accuracy compared to the baseline model.

References

Jain, S., Yadav, S., Prabir, P., & Sundar, S. (2021, June 6). Music information retrieval and classification using deep learning. International Research Journal of Engineering and Tech-nology (IRJET), Vol. 08, P. 1059-1066.

Hu, Y., & Mogos, G. (2022, February). Music genres classification by deep learning. In-donesian Journal of Electrical Engineering and Computer Science, [S.l.], Vol. 25, No. 2, P. 1186-1198. DOI: https://doi.org/10.11591/ijeecs.v25.i2.pp1186-1198

Umale, A., Mehul, Bhandw, P., Bagwan, S., & Patil, S. M. (2023, May 13). Music genre classification. International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), Vol. 3, P. 414-425.

Ashraf, M., Abid, F., Din, I.U., Rasheed, J., Yesiltepe, M., Yeo, S.F., & Ersoy, M.T. (2023). A Hybrid CNN and RNN variant model for music сlassification. Applied Sciences, 13(3), 1476. DOI: https://doi.org/10.3390/app13031476

Tkalychenko, S.V. (2023). Shtuchni neyronni merezhi [Artificial neural networks]. Kryvyi Rih: Derzhavnyi universytet ekonomiky i tekhnolohii [in Ukrainian].

Subbotin, S.O. (2020). Neyronni merezhi: teoriia ta praktyka [Neural networks: theory and practice]. Zhytomyr [in Ukrainian].

Nielsen M. (2013). Neural networks and deep learning. URL: http://neuralnetworksanddeeplearning.com

Tzanetakis, G., & Cook, P. (2002, July). Musical genre classification of audio signals. IEEE Transactions on Speech and Sudio Processing, Vol. 10, No. 5. URL: https://www.cs.cmu.edu/~gtzan/work/pubs/tsap02gtzan.pdf

Zheng, J., & Oussalah, M. (2006). Automatic system for music genre classification. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=342e83b5272b701b225b289e817bb8d92db0fad2

Simic, M., & Aibin, M. (2025, February 28). Milos Simic, Michal Aibin. Hard vs. soft voting classifiers. URL: https://www.baeldung.com/cs/hard-vs-soft-voting-classifiers

Drotar, P., Gazda, M., & Vokorokos, L. (2019). Peter Drotar, Matej Gazda, Liberios Vokorokos. Ensemble feature selection using election methods and ranker clustering. Infor-mation Sciences, Vol. 480, P. 365-380.

Published

2026-03-03