Research of the efficiency of distributed algorithms for machine learning
алгоритм, apache mahout, k-means, fuzzy k-means / c-means, нечітка кластеризація, машинне навчання, hadoopAbstract
This paper discusses the storage, processing and analysis of large amounts of data, as well as machine learning algorithms that implement the processing and extraction of the necessary information from large, not always structured amounts of data.
The work is devoted to the study of the effectiveness of distributed machine learning algorithms implemented in the Apache Mahout project.
As a result of the work, an analysis of the effectiveness of machine-guided algorithms was carried out using the k-Means clustering method and the fuzzy k-Means / c-Means method, implemented in the Apache Mahout project.
The results of testing both clustering methods on the same data sets are obtained.
The accuracy of clustering of each method is considered, and comparative diagrams of the results of the investigated methods are constructed.
Ralf Lammel. Google's MapReduce Programming Model - Revisited. 2017р., 42с.
Tom White. Hadoop: The Definitive Guide. THIRD EDITION. O'RELLY. - 2013р., 647c.
Чак Лем. Hadoop в дії. Москва. - 2012р., 448с.
MicheleNemschoff. Maximize Performance and Scalability Within Your Hadoop Architecture. 2014р.
Sea Owen, Robin Anil, Ted Dunning, Ellen Friedman. Mahout in action. MANNING. - 2012р., 341с.
Adam Coates, Andrew Y. Ng. Learning Feature Representations with K-means, Stanford University, 2012р., 20 стор.
Ershov K.S., Romanova T.N. Analysis and classification of clustering algorithms. MSTU. Not. Bauman. 2016 6s.
Tutorial spoint. Mahout - Clustering.
Alexander N. Gorban, Andrei Y. Zinovyev. Principal Graphs and Manifolds. University of Leicester. 36с.
Kwok, T., Smith, K., Lozano, S., Taniar. Parallel Fuzzy c-Means Clustering for Large Data Sets, 2012р.
Mikhalev A.I., Vinokurova E.A., Sotnik S.L. Computer methods of intelligent data pro-cessing: a textbook. - Dnepropetrovsk: NMetAU, IC "System Technologies", 2014. - 209 pages.