Application of machine learning methods for analysis of Twitter messages
DOI:
https://doi.org/10.34185/1562-9945-6-155-2024-05Keywords:
text classification, natural language processing, social networks, cyberbullying, binary classification, model evaluation, word2vec, Bag-of-Words, TF-IDF.Abstract
The paper investigates the problem of binary classification of text messages for the presence of bullying. Bullying on the Internet, in particular in social networks, is a serious threat to the mental health of users. Aggressive, offensive or humiliating messages can cause stress, anxiety, depression or other mental disorders. Because of this, identifying and prevent-ing cyberbullying is a priority for organizations developing communication platforms. A dataset with Twitter messages was prepared and pre-processed, including cleaning, tokenization, and lemmatization. 3 sets of input data for classification models were created: Bag-of-Words, TF-IDF matrix, word2vec matrix. Models based on various machine learning methods were built and tested: logistic re-gression, k nearest neighbors, random forest, support vector, naive Bayesian classifier meth-ods on each of the input data sets. Based on the results of testing the models, a comparative analysis of their effectiveness was carried out, logistic regression on Bag-of-Words input data was singled out as the most effective model for the task of binary classification of text messages from the selected set. The results obtained in the course of the study can be used for the development of sys-tems for automatic detection of signs of cyberbullying in the messages of users of social net-works and the prompt use of appropriate measures.
References
Pisarenko O. A. Intelektualna sistema filtraciyi komentariv z vikoristannyam ma-shinnogo navchannya. – 2019.
Ivanov O. A. Rozrobka servisu dlya borotbi z kiberbulingom //Avtomatizaciya ta komp’yuterno-integrovani tehnologiyi u virobnictvi ta osviti: stan, dosyagnennya, perspek-tivi rozvitku. – S. 298.
https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification/
Dokumentaciya movi programuvannya Python. [Elektronnij resurs] – Rezhim dostupu do resursu: https://docs.python.org/3/
Dokumentaciya biblioteki spaCy. [Elektronnij resurs] – Rezhim dostupu do resursu: https://spacy.io/usage
Dokumentaciya biblioteki Gensim. [Elektronnij resurs] – Rezhim dostupu do resursu: https://radimrehurek.com/gensim/
Dokumentaciya biblioteki scikit-learn. [Elektronnij resurs] – Rezhim dostupu do resur-su: https://scikit-learn.org/stable/
Downloads
Published
Issue
Section
License
Copyright (c) 2025 System technologies

This work is licensed under a Creative Commons Attribution 4.0 International License.