APPLICATION OF FORMAL STOCHASTIC GRAMMARS IN DETERMINING THE TEXTS AUTHORSHIP

Authors

  • Viktor Shinkarenko
  • Inna Demidovich

DOI:

https://doi.org/10.34185/1991-7848.itmm.2022.01.053

Keywords:

natural language, formal language, formal stochastic grammar, statistical analysis, text structure, text authorship, classification, parsing, confidence interval

Abstract

The work is based on the author's texts individual style presence hypothesis, in particular, the sentence structure formation. In this work, the authorship of natural language texts was determined based on the sentence structure formalization in all texts of each author from the training sample. We used the restoration of a formal stochastic grammar corresponding to each work of the author with the inference rules formation and their application probability calculation based on a statistical sample. To increase the reliability of the results, a confidence interval was calculated using Student's t-test for each of the authors. During authorship establishing, a probabilistic measure of the text belonging to a formal stochastic grammar describing the individual style of the author was determined. The texts authorship established as a result of the experiment was about 80%. The performed experiment clearly showed that the proposed method is competitive among other existing ones.

References

R.A.Hardcastle, CUSUM: a credible method for the determination of authorship? Science & Justice: Journal of the Forensic Science Society, 37(2) (1997) 129-138. doi: 10.1016/s1355-0306(97)72158-0.

Juola, P. Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006).

O. V. Bisikalo, Formal methods imagery analysis and synthesis of natural language constructions: monograph. Vinnitsa: VNTU, 2013.

Ya. O. Kohan, On the possibilities of formalizing natural languages, TAAPSD, volume 3, 137 – 143 (2016).

M. Silberztein, A new linguistic engine for nooj: Parsing context-sensitive grammars with finite-state machines, in: International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ (2017) 240– 250.

T. Booth, Probability Representation of Formal Languages. IEEE Annual Symp, in: Switching and Automata Theory, 1969.

S. Shieber, Evidence against the context-freeness of natural language, in: Linguistics and Philosophy, vol. 8, 1985.

A. Mazzei, V. Lombardo, Building a large grammar for Italian, in: LREC, 2004.

V. Shynkarenko, O. Kuropiatnyk, Constructive Model of the Natural Language, Acta Cybernetica. Vol. 23, Nr 4. (2018) 995–1015. doi: 10.14232/actacyb.23.4.2018.2.

Published

2022-05-18

Issue

Section

Статті