ДЕЯКІ АСПЕКТИ АНАЛІЗУ ПОТОКІВ ТЕКСТОВИХ ДАНИХ

Authors

  • Ю.О. Олійник

Keywords:

АНОМАЛІЯ, ISOLATION FOREST, TEXT MINING, РЕФЕРАЦІЯ ТЕКСТУ, СЕМАНТИЧНИЙ АНАЛІЗ

Abstract

Text stream data anomalies detection approach is presented in the article. Using data preprocessing (normalization, tokenization and noise reduction) and text abstracting for anomalies detection are proposed. Method includes preprocessing and Abstracting stage. Abstracting method developed on base combination of LSA and TextRank methods. Anomalies detection method based on a Isolation Forest method and data stream model. Ukrainian and Russian language text processing is supported. The processing speed of original and abstract data stream is compared.

Published

2020-03-25

Issue

Section

Статті