Application of clustering to improve the accuracy of linear approximations

Authors

  • Sulema Yevgenia
  • Penia Oleksandr

DOI:

https://doi.org/10.34185/1562-9945-6-143-2022-01

Keywords:

digital twins, temporal multimodal data, data analysis.

Abstract

The paper presents an approach to increase the accuracy of modelling an object of research based on a temporal multimodal data set with linear approximations using clustering. The proposed approach can be applied for creating digital twins of a researched object. The purpose of the study as a whole is to create a digital twin of the researched object based on a set of temporal multimodal data with previously unknown relationships, which will allow predictions with greater accuracy than a single linear approximation. The input data set is considered as complete and synchronized. This paper focuses on the use of clustering to analyse the sets of temporal multimodal data that characterize the researched object. The paper presents a method for dividing the data space into intervals, where linear approximations will be more accurate, by clustering based on the values of data points and their statistical characteristics for independent variables that show a nonlinear relationship with the dependent variable. As a result, the accuracy in models that use a linear approxima-tion for a given value has increased (the value of the mean square error used as an accuracy metric has decreased by 11 persents). At the same time, linear models have much better accuracy due to algorithms for calculating parameters that are less prone to overfitting and are more numerically stable. However, the proposed method is more computationally expensive due to the need to perform clustering, calculate intermediary approximations and store more models that describe the system. If there is more data, modalities and variations in the behaviour of the system, their number can be much larger and can lead to some reduction in productivity and accuracy.

References

Norris S. Systematically working with multimodal data: Research methods in multimodal discourse analysis. John Wiley & Sons, 2019.

Gao J. et al. A survey on deep learning for multimodal data fusion. Neural Computation. 2020. Т. 32. №. 5. С. 829–864.

Worsley M. Multimodal learning analytics: enabling the future of learning through multimodal data analysis and interfaces. Proceedings of the 14th ACM international conference on Multimodal interaction. 2012. С. 353–356.

Lahat D., Adali T., Jutten C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proceedings of the IEEE. 2015. Т. 103. №. 9.

С. 1449–1477.

Raol J. R. Data fusion mathematics: theory and practice. CRC Press, 2015.

Bevilacqua M. et al. Digital Twin Reference Model Development to Prevent Operators’ Risk in Process Plants. Sustainability, 2020, Issue 12, Paper 1088, 17 p.

Cai Y. et al. Sensor data and information fusion to construct digital-twins virtual machine tools for cyber-physical manufacturing Procedia manufacturing. – 2017. – Т. 10. – С. 1031-1042.

Talkhestania B.A., Jazdib N., Schlöglc W., Weyrich M. A concept in synchronization of virtual production system with real factory based on anchor-point method. Procedia CIRP, 2018. Vol. 67, P. 13–17.

Sulema Ye., Kerre E., et al. Mathematical Methods in Interdisciplinary Sciences. Wiley, USA, 2020. 464 p.

Tüfekci P. Prediction of full load electrical power output of a base load operated com-bined cycle power plant using machine learning methods. Intern. Journal of Electrical Power & Energy Systems. 2014. Т. 60. С. 126–140.

Published

2022-12-30