Research of the efficiency of distributed algorithms for machine learning

Authors

  • Ekaterina Ostrovskaya
  • Ivan Stovpchenko
  • Vladislav Anischenko

DOI:

https://doi.org/10.34185/1562-9945-1-132-2021-14

Keywords:

алгоритм, apache mahout, k-means, fuzzy k-means / c-means, нечітка кластеризація, машинне навчання, hadoop

Abstract

This paper discusses the storage, processing and analysis of large amounts of data, as well as machine learning algorithms that implement the processing and extraction of the necessary information from large, not always structured amounts of data.
The work is devoted to the study of the effectiveness of distributed machine learning algorithms implemented in the Apache Mahout project.
As a result of the work, an analysis of the effectiveness of machine-guided algorithms was carried out using the k-Means clustering method and the fuzzy k-Means / c-Means method, implemented in the Apache Mahout project.
The results of testing both clustering methods on the same data sets are obtained.
The accuracy of clustering of each method is considered, and comparative diagrams of the results of the investigated methods are constructed.

References

Ralf Lammel. Google's MapReduce Programming Model - Revisited. 2017р., 42с.

URL: https://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf

Tom White. Hadoop: The Definitive Guide. THIRD EDITION. O'RELLY. - 2013р., 647c.

Чак Лем. Hadoop в дії. Москва. - 2012р., 448с.

MicheleNemschoff. Maximize Performance and Scalability Within Your Hadoop Architecture. 2014р.

URL: https://www.smartdatacollective.com/how-maximize-performance-and-scalability-within-your-hadoop-architecture /

Sea Owen, Robin Anil, Ted Dunning, Ellen Friedman. Mahout in action. MANNING. - 2012р., 341с.

Adam Coates, Andrew Y. Ng. Learning Feature Representations with K-means, Stanford University, 2012р., 20 стор.

URL: https://cs.stanford.edu/~acoates/papers/coatesng_nntot2012.pdf

Ershov K.S., Romanova T.N. Analysis and classification of clustering algorithms. MSTU. Not. Bauman. 2016 6s.

Tutorial spoint. Mahout - Clustering.

URL: https://www.tutorialspoint.com/mahout/mahout_clustering.htm

Alexander N. Gorban, Andrei Y. Zinovyev. Principal Graphs and Manifolds. University of Leicester. 36с.

URL: https://arxiv.org/ftp/arxiv/papers/0809/0809.0490.pdf

Kwok, T., Smith, K., Lozano, S., Taniar. Parallel Fuzzy c-Means Clustering for Large Data Sets, 2012р.

URL: http://num-meth.srcc.msu.ru/zhurnal/tom_2012/pdf/v13r207.pdf

Mikhalev A.I., Vinokurova E.A., Sotnik S.L. Computer methods of intelligent data pro-cessing: a textbook. - Dnepropetrovsk: NMetAU, IC "System Technologies", 2014. - 209 pages.

Published

2021-03-01