Performance study of volume normalization methods

Authors

  • Kateryna Ostrovska
  • Roman Beday

DOI:

https://doi.org/10.34185/1562-9945-3-128-2020-15

Keywords:

нормалізація, методи, бенчмарк, лаунчер, дамп, обробка даних, продуктивність, об'ємні дані

Abstract

Data normalization is currently widely used in various fields of science and technology, and not only in the information technology environment. Medicine, geodesy, radio engineering, soil science and many other fields of knowledge use data normalization for more convenient presentation of data and their subsequent analysis.
But, as in any area, there are problems. One of these problems is the normalization of voluminous data. This is a potentially intermediate state between simple data and Big Data. In this case, it is already necessary to take into account the volume, but there is still no need to use Machine Learning solutions. An additional question is the problem of normalizing data types implemented according to the rules / in the context of OOP: object fields can also be voluminous.
The second problem that goes hand in hand with any issue related to normalization is the lack of a specialized library.
The problem of normalizing this type of data may be encountered, for example, in the field of medicine, when the results of laboratory tests need to be normalized to what area is convenient for research and / or practical application, and there is a lot of data and they are large numerical values.
Also, the problem is relevant for areas of science and technology, where time-based statistics are applied. For example, there are collected statistics on the operation of an application in milliseconds for various periods.
To understand the dependencies in the operation of such an application, it will be necessary to normalize the statistics in one segment so that accurate conclusions can be drawn.
To solve such problems, the above issues will be considered in the framework of this work.
The work is devoted to the study of the performance of volume normalization methods.
The work relates to the field of post-processing of experimental and statistical data, consists in converting the input data set to the output in a specific interval (normalization).
In the framework of the work, current normalization methods were studied with the aim of their application to normalize numerical data while maintaining the ratio. A library has been developed that implements methods that meet this criterion, allows you to normalize and visualize the output.

References

Mayer-Shenberger V. Bolshie dannyie. Revolyutsiya, kotoraya izmenit to, kak myi zhivYom, rabotaem i myislim / Mayer-Shenberger V., Kuker K. — M.: Mann, Ivanov, Ferber, 2014. — 240 s.

Stivens R. Algoritmyi. Teoriya i prakticheskoe primenenie. – Moskva: Izdatelstvo «E», 2016. –544 s.

Kreyszig E. Advanced Engineering Mathematics. – Wiley, 1979. – 880 s.

William H. Greene Econometric analysis. - New York: Pearson Education, Inc., 2003. - 1026 s.

Benchmarki [Elektronniy dokument].

(Https://wikipedia.org/wiki/Test_proizvoditelnosti).

DispersIya, standartne vIdhilennya I koefItsIEnt varIatsIYi [Elektronniy dokument]. - ( https://statanaliz.info/metody/opisanie-dannyx/11-

dispersiya-standartnoe-otklonenie-koeffitsient-variatsii).

Oglyad metodIv poperednoYi obrobki danih [Elektronniy dokument]. - (http://www.math.spbu.ua/SD_AIS/documents/2019-12-341/2019-12-b-17.pdf).

Obrobka ob'Emnih danih, Gwyddion [Elektronniy dokument]. - (http://gwyddion.net/documentation/user-guide-ua/volume-data-processing.html).

Ob'EmnI danI [Elektronniy dokument]. –

(http://www.teamnet.ua/gruppa-teamnet/issledovanie-i-razrobotka/opros-mnenij- obemnye-dannye/).

Published

2020-03-16