Constructive and object-oriented modeling text for detection of text borrowings
The scientific community is encouraged to use such models and data structures as arrays of LERP-RSA (the longest expected duplicate array of reduced suffix templates), tag classifier-a model based on Stanford NER's three-class, structures based on DN-sequences, graph representations, etc. The following algorithms are used: GreedyString-Tiling, ARPAD, shingle, statistical methods, genetic algorithms, and others. It should also be noted that much attention is paid to morphological analysis and lemmatization, pre-processing of texts. Models and algorithms only partly have program realization.
The purpose of this work is to develop a text model to identify borrowings and bring it to program implementation. The task is to develop the object-oriented model and program implementation of a graph text model, with the application of the problem of detection of borrowing. As well as obtaining timeframes for program implementation work for further evaluation of the possibility of its use in the academic environment.
The main idea of the graph model is to present the text as a weighted oriented graph. The vertex weight is a character or sequence of characters. Edge weight is the set of numbers of paths into which the edge enters. To formalize the model will use the apparatus of constructive-synthesizing modeling. To create graphs, a constructor and its components are defined: carrier, signature, multiple statements of information support for design. Transformations are made for the constructor: specialization, interpretation and concretization.
On the basis of this model, the object-oriented model is constructed. it includes three classes: vertex, graph and work .
The object of class Work presents the text as a set of objects of class Graph. The correspondences between the components of the presented models are established.
The object-oriented model is implemented by software. Data are given about the execution time of graph construction and texts comparison.
At this stage, software implementation of the model has shown acceptable time performance. Further research in this direction is promising. Directions for improving the model and program are proposed.
Shynkarenkko V. Creation tests for checking plagiarism detection programs' ability of unmasking borrowings / V. Shynkarenko, O. Kutopiatnyk // Information Technologies & Knowledge. – 2018. – Vol. 12, No. 1. – P. 84 – 100.
Gulis I., Chudá D., Petrík J. Plagiarism Detection in Students’ Assignments Written in Natural Language //International Conference on e-Learning. – 2016. – Т. 16. – P. 141.
Mykhailovskyi Yu. B. Anti-Plagiarism System as a Tool to Prevent Plagiarism in Educational and Scientific Activities / Yu. B. Mykhailovskyi, Н. А. Длугунович. – Bulletin of Khmelnytsky National University. Technical sciences. – 2013. – № 3. – C. 162–168.
Xylogiannopoulos K. Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities / . Xylogiannopoulos K, P. Karampelas, R. Alhajj // 2018 IEEE/ACM ASONAM, Barcelona. – 2018. – Р. 938-945. – doi: 10.1109/ASONAM.2018.8508265
Lee E. Identifying text reuse using word net-based extended named entity recognition / E. Lee, P. Kim // Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems. – ACM, 2018. – P. 199 – 202.
Ho P.H. DNA Sequences Representation Derived from Discrete Wavelet Transformation for Text Similarity Recognition / P. H. Ho, N. A. T. Nguyen, T. H. Vo // Sieminski A., Kozierkiewicz A., Nunez M., Ha Q. (eds) Modern Approaches for Intelligent Information and Database Systems. Studies in Computational Intelligence. – Springer, Cham. – 2018. – Vol 769. – P. 75 – 85.
Osman A. H. Conceptual Similarity and graph-based method for plagiarism detection / A. H. Osman, N. Salim, M. S. Binwahlan, H. Hentably, A. M. Ali // Journal of Theoretical and Applied Information Technology.–2011. –Vol. 32, No. 2. – P. 135 – 145.
kumar Jayapal A. Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection // Conference: CLEF. – At http://www.clef-initiative.eu/documents/71612/da184f72-1a8e-43a8-80e4-dd1b2b6fdb09
Zibert A. O. Development of a system for determining the existence of adoption in the works of the students. The search algorithms of indistinct duplicates / A. O. Zibert, V. B. Hrustalev // Universum: технические науки. – 2014. – №. 3 (4). – URL: http://7universum.com/ru/tech/archive/item/1139
Meuschke N. State-of-the-art in detecting academic plagiarism / N. Meuschke, B. Gipp // International Journal for Educational Integrity. – 2013. – Vol. 9 No. 1. – pp. 50–71.
Vani K. Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm / K. Vani K., D. Gupta //Expert Systems with Applications. – 2017. – Vol. 73. – P. 11-26.
Shynkarenko V. Constructive-synthesizing model of text graph representation // V. Shynkarenko, O. Kuropiatnyk – CEUR Workshop Proceedings. – 2016. – Vol. 1631. – P. 63 – 72.
Shynkarenko V. I. Constructive-Synthesizing Structures and Their Grammatical Interpretations. I. Generalized Formal Constructive-Synthesizing Structure / V. I. Shynkarenko, V. M. Ilman. // Cybernetics and Systems Analysis. – 2014. – Vol. 50. – Issue 5. – P. 655-662.