Development of an automated system for clustering text documents

I. Ponomarev

doi:10.34185/1562-9945-1-138-2022-10

Authors

I. Ponomarev

DOI:

https://doi.org/10.34185/1562-9945-1-138-2022-10

Keywords:

clustering, text mining, TF-IDF, HDBSCAN, tokenization, lemmatization, stop words, PYTHON

Abstract

Grouping texts into groups similar in content is a common task in various fields of human activity. Text document clustering is used to automatically categorize text documents, filter emails, group web pages in search engines, and so on. Automation of this process can signifi-cantly reduce the time spent on this task.

References

Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques 3rd Edition. Morgan Kaufmann, 2011, 744 pages.

Prafulla Bafna, Dhanya Pramod, Anagha Vaidya. Document clustering: TF-IDF approach. ICEEOT, 2016, p.61-66.

L. McInnes, J. Healy, S. Astels. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2(11), 2017, p.205-206.

Development of an automated system for clustering text documents

Authors

DOI:

Keywords:

Abstract

References

Published

Issue

Section

License

Language

ouci

crossref

scholar

worldcat

ISSN

bpnu

vernadskiy

copernicus

ulrichs_web

ukrainika

DNTB

Latest publications

languages

© 2025 System technologies. All Rights Reserved.