Software for analyzing text information from Telegram

Authors

  • Makarov Illia
  • Likhouzova Tetiana

DOI:

https://doi.org/10.34185/1562-9945-5-154-2024-05

Keywords:

analysis of text information, machine learning models for text analysis, BERT, classification.

Abstract

In the modern world of information technologies, the development of the Internet has led to a rapid increase in the amount of information. Accordingly, applications that can facilitate work with this information are gaining special relevance today. In this context, systems of aggregation and classification of textual information, which are used to process data from various sources, includ-ing telegram channels, deserve special attention. World trends in this area indicate a growing need for improving tools for processing textual information, which stimulates scientific research and the development of new technologies. The im-portance of such systems is confirmed by active developments in this field by IT companies and uni-versities around the world. The most active field of research is the use of machine learning models for text analysis, which opens up new opportunities for increasing the efficiency of data processing. In the context of developing systems for the analysis of textual information, many existing so-lutions face challenges related to scalability and adaptation to various types of data. However, this work seeks to approach the development of such software from a different angle, focusing its atten-tion on the flexibility and openness of the system to the community. The application supports a lim-ited set of built-in machine learning models optimized for different text data classification tasks, while offering users the ability to integrate their own models according to their unique needs. This approach not only provides a foundation for a wide range of applications, but also promotes com-munity development and innovation by taking advantage of collective intelligence. Software is offered - a web application for analyzing text information from Telegram. Possi-ble areas of application of the developed application cover a wide range of industries - from digital marketing and social research to news analysis and scientific research.

References

Shcho take NLP [Elektronnyi resurs] // Metinvest Digital. – Rezhym dostupu do resursu: https://metinvest.digital/ua/page/1052. - Nazva z ekrana.

Rezultaty vseukrainskoho opytuvannia dlia Konsultatyvnoi misii Yevropeiskoho Soiuzu v Ukraini [Elektronnyi resurs] / Kyivskyi mizhnarodnyi instytut sotsiolohii. - 2023. - Rezhym dostupu: https://kiis.com.ua/materials/pr/20231026_r/AReport_PublicSurvey_EUAM_sept2023_ukr_public.pdf. - Nazva z ekrana.

Naïve Bayes [Elektronnyi resurs]. – Rezhym dostupu do resursu:

https://www.ibm.com/topics/naive-bayes. - Nazva z ekrana.

Cortes C., Vapnik V. Support-Vector Networks [Elektronnyi resurs] // Machine Learning. - 1995. - Vol. 20, No. 3. - Pp. 273-297. - Rezhym dostupu:

https://link.springer.com/article/10.1007/BF00994018. - Nazva z ekrana.

Goodfellow I. Deel Learning / Goodfellow I. Bengio Y. Courville A. – Cambridge, MA : MIT Press, 2016. – 367 s.

Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Trans-formers for Language Understanding [Elektronnyi resurs] / Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova // arXiv preprint arXiv:1810.04805. - 2018. - Rezhym dostupu: https://arxiv.org/abs/1810.04805. - Nazva z ekrana.

Kingma D., Ba J. Adam: A method for stochastic optimization [Elektronnyi resurs] / Diederik P. Kingma, Jimmy Ba // arXiv preprint arXiv:1412.6980. - 2014. - Rezhym dostupu: https://arxiv.org/abs/1412.6980. - Nazva z ekrana.

Downloads

Published

2024-10-03