O.I. Rolik, V.M. Kolesnik SIMULATION OF IT INFRASTRUCTURE WITH CONSIDERATION OF CRITICAL ASPECTS FOR QUALITY OF SERVICE MANAGEMENT

Annotation. Testing hypotheses about service quality management in IT infrastructure requires large and complex data centers with sufficient resources to explore various possible scenarios of infrastructure operation during the provisioning of IT services. For testing purposes, dozens of solutions already exist, but all of them don’t consider critical aspect of IT infrastructure. In order to solve this issue general mathematical model for quality of service management in critical infrastructures was introduced. Based on the proposed model simplest set of tools was developed for creating heavy simulations which can cover criticality during functioning.

52 composition of tasks for service level management and compensate a negative impact of different factors by allocation of additional resources to critical applications [2].
In order to perform a research in such broad area it is important to have access to a pool of resources which can provide means for developing some testbed environment for experiments. However, this can be expensive and not effective because of old resources used at the institutions. Moreover, cloud providers propose to use their resources but only with limited capacity which can not afford to conduct any desirable test suit but only small ones. Another way to handle experiment is to create a simulation environment which is not something new but can save a lot of efforts during experiment implementation.
A comprehensive overview regarding available simulators and possible issues which those are solving as for today is given in [3]. This research gives a broader view on the field of cloud simulators which are actually a subclass of IT infrastructures and concludes that most tools focus on energy modelling and performance, but lacks on security aspects, which is crucial for critical IT infrastructure. Also, [3] has comparison results for 33 simulation toolkits each of which covers some peculiarity of system functioning, therefore for complex problems a couple of them should be used together which introduces additional time for learning curve.
This research proposes to consider novel framework, which can cover critical IT infrastructure simulation issues and provide space for improvement of covered features of critical aspects.
Problem Statement. As for today, a lot of empirical experiments are being conducted in the area of data center modelling, network modelling and cloud simulations. However, the term IT infrastructure is quite rare in this field due to more widespread and specific technical terms mentioned previously. Also, big part of studies uses some hypothesis testing without providing any general implementation which could be used for further improvements and therefore could reduce the number of projects and researches started from scratch. The main goals for this research are to introduce the term IT infrastructure as a generalization of already known models which are basically a subclasses of IT infrastructure models, investigate known solutions and define a novel way to simulate IT infrastructure. Most of problems related to simulating a data center functioning are related to optimization of resources consolidation, reducing energy footprint of a data center and comply with service level agreements during simulation. However, those studies concern only some specific aspects of the IT infrastructure performance. Another objective of this 53 research is to take into account real data during the simulation and add the possibility to simulate different scenarios for quality of service maintenance and management alongside critical plane of IT infrastructure. Basic model of IT infrastructure is going to be tested as well in order to show feasibility of application of provided simulation framework.
Overview of Critical Infrastructure Simulation Definitions. The majority of complex actual system requires quite expensive analytical models to be created and therefore such models then widely investigated via simulation. Worth mentioning that complexity of many real-world systems involves unaffordable analytical models, and consequently, such systems are commonly studied by means of simulation.
However, a simulation term should be clarified before continuing.
In [4,5] simulation is described as "an experiment to determine characteristics of a system empirically. It is a modeling method that mimics or emulates the behavior of a system over time". So, for creating a simulation environment it is necessary to design a model of real or hypothetical system, running this model on a computer and consequently analyzing the output with the help of statistical methods. In such circumstances the state of the modelled system is being modified by a simulation program and therefore reproduces the way how actual system is evolving over time.
According to [6] Shannon has formulated a term of simulation as "the process of designing a model of a real system and conducting experiments with this model for the purpose either of understanding the behavior of the system or of evaluating various strategies (within the limits imposed by a criterion or a set of criteria) for the operation of the system".
Despite the fact that modern IT infrastructures and interconnection inside of them have derived from telecommunication industry it still relevant to use similar approaches for analysis for the first ones. According to [4] three phases of telecom networks design can be distinguished: math analysis with simple models is performed and as a result some numeric data is obtained, simulation phase which in contrast to simple models in math phase results in data closer to the true worlds rather than data returned on analysis and real setup, which gives empirical data.
For assessing the availability, the primal approaches stand on measurement and modelling methods [7]. Approaches which depend on models are rapid and reasonable in contrast to the methods based on measurements. Among the methods for system simulation there are analytical models, discrete-event simulation or combination of two approaches. Rybnicek et al. [8] has made an overview on approaches for modelling a critical infrastructure, among which are: agent-based, unified modelling language (UML), graphical modelling, etc. The most performable among those is agentbased modelling and simulation approach.
In order to model critical IT infrastructure is to follow similar approach for critical infrastructure, which stand on using agents in model. Shortly, agent is object or subject which can perceive it's surrounding and act upon the circumstances happened in the environment. Worth mentioning that each agent must have sensors and effectors in order to be able to do something depending on the conditions. In [8] authors have used defined attributes for each agent which has made it identifiable, situated, goal-directed, self-directed and autonomous.
The primary goal during critical infrastructure simulation is to investigate dynamic effects introduced from attacks or disruptions. The use of techniques to perform a simulation allows to understand deeply the domino and cascading effects.
Because of interconnection of unique services in critical infrastructure and unavailability or malfunctioning of any can lead to significant consequences. Moreover, simulation assists in providing new knowledge about complex systems which allows to advance redundancies planning and development of incident response strategies, which is stated in [8].
Authors of [9] have introduced and approach for enhancing the protection of critical infrastructures which involves a bunch of steps to facilitate the simulation of such infrastructure. They have covered such steps like model development of services, context description, dependency and interdependency identification, probabilistic and deterministic models and final Monte Carlo simulation stage for exploratory analysis. Some of these steps are also important during simulation of critical IT infrastructure, because despite the fact that boundaries of IT infrastructure are bit different but threats are similar.
As a continuation of [9] researches in [10] have introduced the description for the consequences of cyber-attacks on critical infrastructure. In case of attack on power system of infrastructure the impact can be in range from 0 to growing losses of not providing energy to some customers, which can influence on country in general. To address critical aspect authors in [9][10][11] are referring to interdependencies issues. Among the reasons for dependencies between services there are: functional dependencies like inputs for one service has come from the outputs of other service; analogous components which contain common source of failures; and common envi- Another approach for calculating interdependencies is proposed in [12] which is based on quality of service (QoS) indices, performance indicators and dependence indices. A performance indicator implies a measured value of system requirement which represents some characteristic of a system. Objective function which involves all performance indicators should be also defined because sometime increase in one indicator can lead to decline in other.
On other hand, quality of service is nothing more than objective function for the service of interest from the mathematical point of view and which is also implies function based on a user perspective of a service.
The last one, which [12] contains regarding interdependencies metrics is a dependence index, which is "a numerical measurement of the degree to which an activity (or service) depends on another activity (service), system or physical or human component." IT infrastructure management tasks. According to [13] there are three generalized areas of management, two of which cover technical side of all management functions in IT infrastructure: operative or automatic quality level maintenance for IT services and reasonable utilization of resources.
Those tasks should be considered in tight connection. For critical IT infrastructures reasonability in resource utilization are not relevant, because reliability indicators are becoming more prevailing. Moreover, allocation of significant amount of resources for reservation aims is not only justified but necessary.
General management task can be defined as follows. Let the state in IT infrastructure be described with a variable S from the set of possible state S. The state of IT infrastructure at some moment in time can depend on control impacts like in (1): (1) Supposing that there a functional F(U, S) which is defined on product set of U and S, that defines efficiency of IT infrastructure functioning. By knowing that, efficiency management indicator can be determined by (2): Then the objective of IT infrastructure management is reduced to finding such acceptable control impact that maximizes value of management efficiency indicator.
But it is true only when reaction of IT infrastructure (1) to control impacts is known: In turn, quality of all services Q is determined by the quality of all services: Therefore, control impacts must support given level of quality for each service with ensuring maximum level of availability: (6) From the user's perspective management efficiency criteria for maintenance of the quality of j-th service can be selection of such control impact which brings minimal time for handling i-th request to the application Aj: Where , i j A t is time when a request has been acquired by the system and , i j R t when the system has handled it and send back to user. As a result, in order to provide reasonable models for IT infrastructure with considering critical indicators it was necessary to define all the values for each part of management process from (1) to (7), which allows to proceed with implementation of simulation models.
IT Infrastructure Simulation Framework Implementation. For developing a simulation framework R language was used. Extension package written for R, called simmer was used. According to [14], this package introduces discrete-event simulations to R. It is developed as universal process-oriented framework, which utilizes core written with C++. Moreover, it allows to monitor all processing during simulation in automatic manner. It adopts the concept of trajectory which is "a common path in the simulation model for entities of the same type" [14].
For further research it is suggested to use models from [15]  can cover denial-of-service attacks issues, improve the performance metrics of routers, as well as provide means to application level traffic classification.
A few common entities were developed for simulation framework: IT infrastructure, scheduler, server and request. As for now, request was marked as http but it is very abstract so far, therefore can be considered as any request. Future work in this direction will elaborate peculiarities of http request and extend types of requests supported by IT infrastructure.
In order to produce some simulated events inside the simulation, it is necessary to get some empirical data about say waiting and run times of jobs or requests from real data sets. For this purpose, the AuverGrid [16] dataset from Grid Workload Archives [17] was used. Also, authors in [17] have described the generic structure of all datasets and common properties collected in different systems. The dataset contains info about wait time and run time of each request, a lot of meta information, like user id or group id, queue id, etc. The most interesting for dynamic behavior is the wait and run times. The last one which is also very important is interarrival time, which can be calculated additionally by grouping all requests into hourly intervals, get a sum of all requests in each hour which will refer to as arrival rate according to (8). It shows that arrival rate for the whole dataset is going to be a vector of values. (8) And then to get interarrival time it is necessary to divide 1 by arrival rate and multiple with 3600 which is seconds equivalent of an hour in order to get time between arrivals in seconds, as shown on (9).  From the picture can be seen small gap between real data and random data.
Due to approximations of course it is possible to lose some values from the sample, but in total cumulative results are showing better picture, which can reflect adequacy of this model of behavior. Next step is to develop appropriate objects for simulation. Simmer contains some basic tools which then can be extended for solving any specific problems. For simulating an IT infrastructure provided entities were extended as follows: initial resource in simulation was wrapped into few entities, like Scheduler, Server, ResultNode and IT infrastructure itself. Those are also described as: IT infrastructure object, which is actually a simmer environment with ad-  After setting up the environment next results were obtained: usage of servers in IT infrastructure (Fig. 3) and utilization of schedulers and servers (Fig. 4). From the obtained results with the simulation parameters, related to the dataset very distinct behavior of the system can be observed which allows to make a conclusion that the system is overwhelmed with scheduling resource but not the processing resource. Original report on AuverGrid also shows that the system's overall utilization during a year period differs between average and maximum values, which are correspondent to 58 and 100 per cents. By knowing this we can make an assumption that the model is adequate for simulating similar to AuverGrid environment and perform some testing of management methods.

Conclusion.
As a result of this research some models were discovered and connected with each other as an ontological description of some processes in IT infrastructure. Also, overview of common tools for simulating IT infrastructure has been performed.
The main difference from already existing simulation solutions is in putting new restriction on IT infrastructure as a research object which is a criticality aspect.
It allows to construct basic scenarios of IT infrastructure functioning with making some blocking and attacking possible in order to replicate similar circumstances to real world.
Further research will focus on details of network functioning, cyber attacks peculiarities, resource management issues and user's perception issues during calculating quality of service values and checking SLA compliances.