Intelligent audio file classification system
DOI:
https://doi.org/10.34185/1562-9945-5-162-2026-24Keywords:
artificial intelligence, neural networks, machine learning, intelligent system, solving classification task, recognition quality metrics, model parameters optimization, ensemble voting methods, audio features, musical genresAbstract
The continuously increasing volume of music compositions highlights the need for effec-tive organization to ensure convenient user access. Genre classification is one of the common approaches, enabling listeners to select tracks according to their individual preferences and receive automated recommendations for new content. The literature review of previous stud-ies demonstrated that different types of artificial neural network architectures are used for classification, utilizing various audio features and their combinations for training. Using mel-frequency cepstral coefficients (MFCCs) as input data revealed the advantages of convolu-tional neural networks (CNN) over multilayer perceptrons (MLP) and recurrent neural net-works (RNN). Evaluating activation functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU on MLP, RNN and CNN showed that Leaky ReLU achieved the best performance due to its ability to retain scaled negative gradients, unlike ReLU, which just zeros them out. A com-bined CNN+RNN architecture provided great results using MFCCs and spectrograms as in-put data. This study aims to analyze the effectiveness of combining various types of audio fea-tures for genre classification using a multilayer perceptron, as well as to explore methods for improving classification accuracy. Time-domain and frequency-domain audio features were examined. The neural network was trained and tested on the GTZAN dataset, containing 1000 audio samples with duration of 30 seconds, 100 samples for each of 10 genres. The hyper-parameters were automatically optimized using the Optuna framework, which applies intelli-gent search based on Tree-structured Parzen Estimator. Furthermore, a post-processing mechanism based on hard voting, soft voting, and the Borda count method was introduced. Experimental results demonstrate that the proposed ensemble approach significantly en-hances the classification accuracy compared to the baseline model.
References
Jain, S., Yadav, S., Prabir, P., & Sundar, S. (2021, June 6). Music information retrieval and classification using deep learning. International Research Journal of Engineering and Tech-nology (IRJET), Vol. 08, P. 1059-1066.
Hu, Y., & Mogos, G. (2022, February). Music genres classification by deep learning. In-donesian Journal of Electrical Engineering and Computer Science, [S.l.], Vol. 25, No. 2, P. 1186-1198. DOI: https://doi.org/10.11591/ijeecs.v25.i2.pp1186-1198
Umale, A., Mehul, Bhandw, P., Bagwan, S., & Patil, S. M. (2023, May 13). Music genre classification. International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), Vol. 3, P. 414-425.
Ashraf, M., Abid, F., Din, I.U., Rasheed, J., Yesiltepe, M., Yeo, S.F., & Ersoy, M.T. (2023). A Hybrid CNN and RNN variant model for music сlassification. Applied Sciences, 13(3), 1476. DOI: https://doi.org/10.3390/app13031476
Tkalychenko, S.V. (2023). Shtuchni neyronni merezhi [Artificial neural networks]. Kryvyi Rih: Derzhavnyi universytet ekonomiky i tekhnolohii [in Ukrainian].
Subbotin, S.O. (2020). Neyronni merezhi: teoriia ta praktyka [Neural networks: theory and practice]. Zhytomyr [in Ukrainian].
Nielsen M. (2013). Neural networks and deep learning. URL: http://neuralnetworksanddeeplearning.com
Tzanetakis, G., & Cook, P. (2002, July). Musical genre classification of audio signals. IEEE Transactions on Speech and Sudio Processing, Vol. 10, No. 5. URL: https://www.cs.cmu.edu/~gtzan/work/pubs/tsap02gtzan.pdf
Zheng, J., & Oussalah, M. (2006). Automatic system for music genre classification. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=342e83b5272b701b225b289e817bb8d92db0fad2
Simic, M., & Aibin, M. (2025, February 28). Milos Simic, Michal Aibin. Hard vs. soft voting classifiers. URL: https://www.baeldung.com/cs/hard-vs-soft-voting-classifiers
Drotar, P., Gazda, M., & Vokorokos, L. (2019). Peter Drotar, Matej Gazda, Liberios Vokorokos. Ensemble feature selection using election methods and ranker clustering. Infor-mation Sciences, Vol. 480, P. 365-380.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 System technologies

This work is licensed under a Creative Commons Attribution 4.0 International License.









