Models and methods of learning neural networks with differentiated activation functions

Authors

  • Dmytro Zelentsov
  • Shaptala Taras

DOI:

https://doi.org/10.34185/1562-9945-6-143-2022-05

Keywords:

neural networks, the task of learning neural networks, multidimensional optimization, vector of varied parameters, gradient methods.

Abstract

Analysis of the literature made it clear that the problem associated with improving the performance and acceleration of ANN learning is quite actual, as ANNs are used every day in more and more industries. The concepts of finding more profitable activation functions have been outlined a lot, but changing their behavior as a result of learning is a fresh look at the problem. The aim of the study is to find new models of optimization tasks for the formulated prob-lem and effective methods for their implementation, which would improve the quality of ANN training, in particular by overcoming the problem of local minima. A studied of models and methods for training neural networks using an extended vector of varying parameters is conducted. The training problem is formulated as a continuous mul-tidimensional unconditional optimization problem. The extended vector of varying parameters implies that it includes some parameters of activation functions in addition to weight coeffi-cients. The introduction of additional varying parameters does not change the architecture of a neural network, but makes it impossible to use the back propagation method. A number of gradient methods have been used to solve optimization problems. Different formulations of optimization problems and methods for their solution have been investigated according to ac-curacy and efficiency criteria. The analysis of the results of numerical experiments allowed us to conclude that it is expedient to expand the vector of varying parameters in the tasks of training ANNs with con-tinuous and differentiated activation functions. Despite the increase in the dimensionality of the optimization problem, the efficiency of the new formulation is higher than the generalized one. According to the authors, this is due to the fact that a significant share of computational costs in the generalized formulation falls on attempts to leave the neighborhood of local min-ima, while increasing the dimensionality of the solution space allows this to be done with much lower costs.

References

Haykin S. Neural Networks: A comprehensive foundation. Prentice Hall, 1999. – 842 p.

Russell S., Norvig P. Artificial Intelligence: A Modern Approach, fourth edition. London: Pearson, 2020. – 1136 p.

Freund Y., Hausler D. Unsupervised learning of distributions on binary vectors using two-layer networks // In Advances in Neural Information Processing Systems 4: Conference and Workshop on Neural Information Processing Systems. – 1992. –

P. 912 – 919.

Kelbling, Leslie P., Littman, Michael L., Moore, Andrew W. Reinforcement Learning: A Survey // Journal of Artificial Intelligence Research. - Archived from the original on Novem-ber 20, 2001. – P. 237 – 285.

Zelentsov D.G., Korotka L.I. The use of neural networks in solving problems of the dura-bility of corroding structures. // Bulletin of the Kremenchug National University named after M. Ostrogradskyi. – Kremenchug: KrNU, 2011. – Vol. 3 (68), part. 1. – P. 24 – 27.

Zelentsov D.G., Denysiuk O.R., Korotka L.I. The Method of Correction Functions in Prob-lems of Optimization of Corroding Structures. // Advances in Computer Science for Engineer-ing and Education III (ICCSEEA 2020), 2020. – pp. 132 – 142.

Zelentsov D.G., Korotka L.I., Naumenko N.U. Accuracy Control Algorithm for Numerical Solution of Certain Classes of Systems of Differential Equations // System technologies. Re-gional interuniversity collection of scientific works. – Issue 5 (82). – Dnipropetrovsk, 2012. – P. 71 – 79.

Callan R. Basic concepts of neural networks. – M.: Williams Publishing House, 2001. – 287 p.

Published

2023-11-13