Research of descriptors for digit recognition of MNIST dataset
DOI:
https://doi.org/10.34185/1562-9945-2-127-2020-04Keywords:
розпізнавання, цифри рукопису MNIST, дескриптори, Ху-моменти, гістограми, Python, Scikit-Learn, метод k-середніхAbstract
The work is dedicated to solving the task of digit recognition.
Based on the results of twenty years of research, we can verify that the problem of digit recognition, even being well studied, is still of considerable interest.
The MNIST set has become a test set one and is used by many authors for testing of recognition algorithms. In the work “MNIST. Who is the best at MNIST” there is a spreadsheet with result of handwritten numbers recognition made by different algorithms, which are combined into groups.
The best result of recognition have an error of less than 1%. They are obtained using neural networks. Successful algorithms of recognition, including deep learning, are hidden from user and they are difficult in description. That is why the descriptor-based recognition algorithm is still relevant.
The goal of the work is the study of influence of descriptors and reduction of
their quantity for recognition of MNIST set handwritten numbers
For recognition of the MNIST digits, a set of 12 descriptors was chosen, namely: seven Х-moments, En (Euler's number), Ex (filling coefficient), Ec (eccentricity), Yn, Xn (coordinates of the center of gravity). Statistical characteristics and histograms in relation to the training and test sets MNIST were determined. Based on their research, a number of assumptions were made.
Digit recognition with usage of classifier based on on k-means method with n_neighbors = 10 of Scikit-Learn Python system library was done. Preliminary analysis of descriptors gave the reason to assume, that the fifth, sixth and seventh Hu-moments doesn’t contribute into result of digit recognition of test set using k-means method. This assumption is justified with researches, which showed, that there is a need to exclude eccentricity from the set of descriptors
Thus, for recognition of a set of handwritten digits by the k-means method with n_neighbors = 10, it is advisable to take 8 descriptors instead of 12, excluding the fifth, sixth and seventh Hu-moments and eccentricity. Recognition accuracy was 78.58% compared to 78.14%.
References
Shlezinger M., Glavach V. Desyat lektsiy po statisticheskomu i strukturnomu raspoznavaniyu. - K.: Naukova dumka, 2004. — 545 s.
Barsegyan A. A., Kupriyanov M. S., Stepanenko V.V., Holod I.I. Metodyi i modeli analiza dannyih: OLAP i Data Mining, - SPb.: BHV-Peterburg, 2004. – 336s.
Plas Dzh. Vander. Python dlya slozhnyih zadach: nauka o dannyih i mashinnoe obuchenie. — SPb.: Piter, 2018. — 576 s.
Neyronnyie seti s samoorganizatsiey v zadachah klassifikatsii i obrabotki izobrazheniy / G. A. Ososkov, S. G. Dmitrievskiy, A. V. Stadnik // Iskusstv. intellekt. - 2004. - # 3. - S. 574-586. - Bibliogr.: 6 nazv. - rus.
Dorosh N. L., Hrapach Yu. A. Programmnoe sredstvo dlya raspoznavaniya tsifr na izobrazheniyah// Materialyi mezhdunarodnoy nauchnoy konferentsii «Intellektualnyie sistemyi prinyatiya resheniy i problemyi vyichislitelnogo intellekta» (ISDMCI’2012). – Herson: HNTU, 2012. – S. 353-355.
MNIST. Who is the best in MNIST? [Electronic resource] - Access mode.— URL:https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html /( date of appeal 18.10.2019)
Raspoznavanie rukopisnyih tsifr s ispolzovaniem svertochnyih neyronnyih setey v Python s Keras [Electronic resource] - Access mode.— URL:
https://www.machinelearningmastery.ru/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras//( date of appeal 18.10.2019)
Zheron, Orelen. Prikladnoe mashinnoe obuchenie s pomoschyu Scikit-Learn i TensorFlow: kontseptsii, instrumentyi i tehniki dlya sozdaniya intellektualnyih sistem./Per. s angl. — SPb.: OOO "Alfa-kniga, 2018. — 688 s.
The MNIST database of handwritten digits. [Electronic resource] - Access mode.— URL: http://yann.lecun.com/exdb/mnist/ / (date of appeal 18.10.2019).
Gonsales R., Vuds R., Eddins S. Tsifrovaya obrabotka izobrazheniy v srede MatLab. M: Tehnosfera, 2006. – 616 s.
Hu_moments_in_Python. [Electronic resource] - Access mode. — URL:https://github.com/adailtonjn68/hu_moments_in_python/blob/master/hu_moments.py/( date of appeal 24.11.2019).
Yane B. Tsifrovaya obrabotka izobrazheniy. M: Tehnosfera, 2007. – 584 s.