AUTOMATION OF THE OF SYMBOLS CORRECTNESS ANALYSIS IN SCIENTIFIC WORKS
Keywords:lexical analysis, semantic analysis, formulas, tex, LL1-grammar
In paper we propose the system of automated formation of the list of symbols in scientific works as kind of hybrid language texts, which are presented in the format of Tex. We describe the main system components. To analyze the text, it is suggested to use LL1 grammar. For grammar, terminal and nonterminal sets and a set of rules are defined. The rules for recognizing variables are described. The usage of LL1 grammar allows expanding the system for new characters of mathematical packages and combining the stages of parsing the text and forming a list. The system can be useful for improving the quality of presentation of texts in hybrid languages.
Shah A. K., Dey A., Zanibbi R. A Math Formula Extraction and Evaluation Framework for PDF Documents //International Conference on Document Analysis and Recognition. – Springer, Cham, 2021. – P. 19-34.
Li X. H., Yin F., Liu C. L. Page object detection from pdf document images by deep structured prediction and supervised clustering //2018 24th International Conference on Pattern Recognition (ICPR). – IEEE, 2018. – P. 3627-3632.
Mahdavi M. et al. LPGA: Line-of-sight parsing with graph-based attention for math formula recognition //2019 International Conference on Document Analysis and Recognition (ICDAR). – IEEE, 2019. – P. 647-654.
Kostalia E. E., Petrakis E. G. M., Bourbakis N. Evaluating Methods for the Parsing and Understanding of Mathematical Formulas in Technical Documents //2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). – IEEE, 2020. – P. 407-412.
Kaliszyk C., Urban J., Vyskocil J. System description: statistical parsing of informalized Mizar formulas //2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). – IEEE, 2017. – P. 169-172.
Lvovsky S. M. Typing and layout in the LATEX system. – 2003.