Lexical analysis of program code

Authors

  • I.O. Sokol
  • O.S. Volkovskyi

DOI:

https://doi.org/10.34185/1562-9945-5-142-2022-09

Keywords:

lexical analyzer, lexeme, token, regular expressions, theory of formal languages, nondeterministic finite automaton, deterministic finite automaton

Abstract

The growing volume of technologies, the end of actively used development tools support, outdated API etc., entails the need of program codes conversion. In IT compa-nies and not only, often begged the question of deprecated software support, which cus-tomers continue to use, or translation of current software to actual technologies. It is more rational for programmers to use the conversion and save most of code base, than rewriting all software by hand, even if manual adjustment is needed. At this moment, there are few high-quality code conversion systems. Largely, conversion systems work well only with similar programming languages. The task of program codes conversion is a deep and complex topic. To convert the software code, you must first analyze, select components and form a structural representation. Any analysis of program code begins with lexical analysis. Although lexical analysis is considered a relatively simple step, it plays a key role in the entire system of analysis and transformation of software code, and also has a large number of both theoretical and practical features that require careful study. This article considers the definition of the lexical analyzer, its functional composition and principles of construction, provides key differences between the lexeme and the token. Two approaches have been proposed and considered to solve the search for tokens in the program code: regular expression search and finite state machine search. For these approaches, examples of the formation of search templates under cer-tain rules of vocabulary were given. As a result, the optimality of the use of determinis-tic finite state machines during the practical implementation of the lexical analyzer on real computing machines was substantiated.

References

V.G. Pavlov, К.О. Shapran Basic principles of building lexical analyzers. Materials of the IV International Scientific and Technical Conference of Young Scientists and Students. Actual tasks of modern technologies - Ternopil, November 25-26, 2015.

P. 68-69.

Lexical analysis [Electronic resource] – Resource access mode: https://en.wikipedia.org/wiki/Lexical_analysis

Introduction to compiler design (PPT) [Electronic resource] – Resource access mode: https://www.nesoacademy.org/cs/12-compiler-design/ppts/01-introduction-to-compiler-design

Regular expressions [Electronic resource] – Resource access mode: https://bit.ly/3UrPRvp

Regular expression [Electronic resource] – Resource access mode: https://en.wikipedia.org/wiki/Regular_expression

Regular Expressions (REs) [Electronic resource] – Resource access mode: https://lambda.uta.edu/cse5317/notes/node7.html

Finite state machine [Electronic resource] – Resource access mode: https://bit.ly/3ukLJT4

Learning the GNU Flex Code Generator [Electronic resource] – Resource access mode: https://ps-group.github.io/compilers/gnu_flex_doc

Published

2022-10-28