Review of methods for semantic text classification
DOI:
https://doi.org/10.34185/1562-9945-5-154-2024-13Keywords:
Text classification, Naive Bayes, Logistic regression, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Transformers, Tone Analysis, Natural Language Processing).Abstract
Recent advancements in text classification have focused on the application of machine learn-ing and deep learning techniques. Traditional methods such as Naive Bayes, Logistic Regression, and Support Vector Machines (SVM) have been widely utilized due to their efficiency and simplic-ity. However, the advent of deep learning has introduced more complex models like Artificial Neu-ral Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), which can automatically extract features and detect intricate patterns in textual data. Addi-tionally, transformer-based models such as BERT have set new benchmarks in text classification tasks. Despite their high accuracy, these models require substantial computational resources and are not always practical for every application. The ongoing research aims to balance accuracy and computational efficiency. Purpose of Research. The primary objective of this study is to review and compare various methods for automated text classification based on sentiment analysis. This research aims to evalu-ate the prediction accuracy of different models, including traditional machine learning algorithms and modern deep learning approaches, and to provide insights into their practical applications and limitations. Presentation of the Main Research Material. This study utilizes the “IMDB Dataset of 50K Movie Reviews” to train and test various text classification models. The dataset comprises movie reviews and their associated sentiment labels, either positive or negative. The research employs several preprocessing steps. For feature extraction, methods such as Bag-of-Words (BoW), TF-IDF (Term Frequency-Inverse Document Frequency), and Word2Vec are used. These features are then fed into various classifiers: Naive Bayes, Support Vector Machines (SVM), Logistic Regression, Deep Learning Models. Conclusions. The comparative analysis reveals that while traditional machine learning meth-ods like Naive Bayes, SVM and Logistic Regression are efficient and easy to implement, deep learn-ing models offer superior accuracy by capturing more complex patterns in the data. However, the computational demands of deep learning models, particularly transformers, limit their applicability in resource-constrained environments. Future research should focus on optimizing these models to balance accuracy and computational efficiency, making advanced text classification accessible for a broader range of applications. Recent advancements in text classification have focused on the application of machine learn-ing and deep learning techniques. Traditional methods such as Naive Bayes, Logistic Regression, and Support Vector Machines (SVM) have been widely utilized due to their efficiency and simplic-ity. However, the advent of deep learning has introduced more complex models like Artificial Neu-ral Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), which can automatically extract features and detect intricate patterns in textual data. Addi-tionally, transformer-based models such as BERT have set new benchmarks in text classification tasks. Despite their high accuracy, these models require substantial computational resources and are not always practical for every application. The ongoing research aims to balance accuracy and computational efficiency.
References
Source Code for the Article. URL: https://github.com/w3t4nu5/NLP-Article
IMDB Dataset of 50K Movie Reviews.
URL: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
HuggingFace: Transformers. URL: https://huggingface.co/docs/transformers/index
Stopwords [NLP, Python]. URL: https://medium.com/@yashj302/stopwords-nlp-python-4aa57dc492af
Pavliuk, D. I., Baibuz, O. H., and Honcharova, Y. S. "Text Preparation for Natural Language Processing." 'XIX International Scientific and Practical Conference “Creative Business Manage-ment and Implementation of New Ideas”', 14-17 May 2024, Tallinn, Estonia, pp. 223-225.
Feature extraction. URL: https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction
Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vec-tor Space. 2013. URL: https://arxiv.org/pdf/1301.3781
MultinomialNB.
URL: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html
Support Vector Machines. URL: https://scikit-learn.org/stable/modules/svm.html
LogisticRegression. URL: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Elastic Net Regression —Combined Features of L1 and L2 regularization. URL: https://medium.com/@abhishekjainindore24/elastic-net-regression-combined-features-of-l1-and-l2-regularization-6181a660c3a5
Google Code: word2vec. URL: https://code.google.com/archive/p/word2vec/
Natural Language Processing in TensorFlow. URL: https://www.coursera.org/learn/natural-language-processing-tensorflow/home/week/1
Downloads
Published
Issue
Section
License
Copyright (c) 2024 System technologies

This work is licensed under a Creative Commons Attribution 4.0 International License.