Development of logging model in ETL systems

Authors

  • Viktoriia Hnatushenko
  • Nikolay Vinogradov
  • Oksana Vinogradova

DOI:

https://doi.org/10.34185/1562-9945-5-124-2019-08

Keywords:

модель, логування, ETL система

Abstract

The three steps constitute ETL (Extract, Transform, Load) - one of the main data management processes after receiving data from multiple sources and uploaded to a data warehouse (DWH) in order to get reliable information. The ETL process implements in a different way: by developing an ETL program, by creating a set of embedded program procedures, or by using ETL tools.
Any process requires logging and fixation of all stages and all processes. The basic method to logging is to record all the necessary information in the final cumulative system.
The purpose of this investigation is to develop a model of logging ETL-system processes of any complexity and level of nesting for use in the enterprise information system.
After the task has been analyzed, it was decided to implement the system on the basis of Microsoft SQL Server Database and using development tools of Microsoft SQL Server Management Studio, C # .NET language and Microsoft Visual Studio.
After analyzing the tasks, it was decided to implement the given system on the basis of Microsoft SQL Server Database and using the development tools of Microsoft SQL Server Management Studio, C # .NET language and Microsoft Visual Studio. The main objects of the database are the two main tables with information about processes and the functional code of downloading. Two tables and one trigger on the database side have been further developed for testing.
The system keeps information about the processes, steps and results of these processes. Its main advantage is that processes can be distributed across systems, but the results will be accessible and transparent in one database. Another advantage is that even the internal logical components of the ETL model processes or steps, such as the logical parts of the .Net or Java methods, or the logical parts of the stored procedures on the database side, can be logged in. This system can be extended by developers for greater expansion and to increase the diversity of entities. The existing models of logging of ETL systems has been analyzed and investigated, new logging system was created with the help of relational database and events in ETL system on the basis of SSIS package, which allows to display detailed information about all steps of ETL-system with parameters, results, and blocks. The system can easily adapt to any end-user's needs and has advantages over existing trigger-based and built-in models.

References

Jos van Dongen, Matt Casters, Roland Bouman. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. – John Wiley & Sons, 2010. – 674р./ ISBN: 9780470942420

Tochilkina T.E. Data Warehousing and Business Intelligence Tools: Textbook / T.E. Tochilkina, A.A. Gromova - Moscow: Financial University, 2017 .-- 161 p.

Alexey Polev ETL - a technology that accompanies any BI-initiative // Jet Info - March 29, 2012. - No.

Ralph Kimball, Joe Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. — Wiley Publishing, Inc., 2004. – 491р.

Published

2019-11-25