ДЕЯКІ АСПЕКТИ АНАЛІЗУ ПОТОКІВ ТЕКСТОВИХ ДАНИХ

  • Ю.О. Олійник

Abstract

Text stream data anomalies detection approach is presented in the article. Using data preprocessing (normalization, tokenization and noise reduction) and text abstracting for anomalies detection are proposed. Method includes preprocessing and Abstracting stage. Abstracting method developed on base combination of LSA and TextRank methods. Anomalies detection method based on a Isolation Forest method and data stream model. Ukrainian and Russian language text processing is supported. The processing speed of original and abstract data stream is compared.

References

1. Mehrotra, K. G., Mohan, C. K., & Huang, H. (2017). Anomaly detection principles and algorithms (p. 217). New York, NY, USA:: Springer International Publishing.
2. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
3. Ding, Z., & Fei, M. (2013). An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Volumes, 46(20), 12-17.
4. Yu. Oliynik. Review and analysis of algorithms TEXT MINING / O. Gavrilenko, Yu. Oliynik, H. Hanko. // Project management, systems analysis and logistics. – K .: NTU, 2017. - Vol., pp32-41
Published
2020-03-25
Section
Статті