AUTOMATED RECONSTRUCTION OF PRODUCTION GRAMMARS BASED ON STRUCTURAL ANALYSIS OF MATHEMATICAL FORMULAS IN LATEX FORMAT

Authors

DOI:

https://doi.org/10.34185/1991-7848.itmm.2026.01.043

Keywords:

information technology, software, structural and production modeling, formal grammars, structural analysis

Abstract

The problem of semantic-structural analysis of mathematical expressions in scientific texts presented in LaTeX format is investigated. The analysis of existing approaches in the field of Mathematical Information Retrieval is carried out and their shortcomings associated with dependence on static dictionaries or low interpretability are revealed. A method of automated restoration of production grammars based on the principles of constructive-production modeling is proposed. An algorithm is developed that performs dynamic lexical analysis, construction of an abstract syntactic tree taking into account prefix operators, as well as upward tree folding for rule generation. The difference of the approach is the dynamic selection of the terminal carrier and signatures of constructors without predefined templates. The results are a basic stage for creating transparent algorithms for clustering scientific documents based on their mathematical apparatus.

References

Greiner-Petter R. et al. Discovering Mathematical Objects of Interest – A Study of Mathematical Notations // Proceedings of The Web Conference (WWW '20). ACM, 2020. P. 1445-1456.

Shynkarenko V. I., Ilman V. M. Constructive-Synthesizing Structures and Their Grammatical Interpretations. I. Generalized Formal Constructive-Synthesizing Structure // Cybernetics and Systems Analysis. — 2014. — Vol. 50, No. 5. — P. 655–662. DOI: 10.1007/s10559-014-9655-z.

Zhong J. et al. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding // arXiv preprint arXiv:2105.00377. 2021. 12 p.

Published

2026-04-26

Issue

Section

Theses