Towards creating a dataset based on events classified by a network intrusion detection system

Authors

DOI:

https://doi.org/10.34185/1562-9945-3-164-2026-02

Keywords:

Snort 3, NIDS, dataset, SIP/2.0, Fast Pattern, byte-level features, packet telemetry, entropy, risk scoring, real-time neural networks

Abstract

This paper presents an approach to building a dataset for training machine-learning models in the context of the Snort 3 network intrusion detection system. Unlike conventional NIDS datasets, the proposed dataset is constructed from normalized inspector byte buffers and lightweight packet telemetry that are available during real-time traffic processing. Ground truth is defined by the controlled origin of traffic (attack/benign PCAP), while Snort rule triggers are treated as a “teacher” signal to support subsequent risk-scoring models. The dataset is generated for the SIP/2.0 Fast Pattern group and contains tens of thousands of events with a standardized train/validation/test split. In addition, we analyze byte-position informativeness using Jensen–Shannon divergence and entropy, and perform correlation analysis of telemetry features. The results indicate that the discriminative signal is largely localized in the early parts of the message (header and initial payload) and that padding does not introduce trivial information leakage between classes. The resulting dataset can serve as a foundation for real-time neural models that complement signature-based detection with probabilistic risk assessment.

References

Horbatov, V. S. (2023). Signature pre-filtering method for accelerating attack search by network intrusion detection systems. Modern Information and Communication Technologies in Transport, Industry, and Education: International Scientific and Practical Conference, Dni-pro, December 13–14, 2023. Dnipro, pp. 136–137.

Kenyon A., Deka L., Elizondo D. Are public intrusion datasets fit for purpose cha-racterising the state of the art in intrusion event datasets. Computers & security. 2020. Vol. 99. P. 102022. URL: https://doi.org/10.1016/j.cose.2020.102022 (date of access: 21.02.2026).

Generating network intrusion detection dataset based on real and encrypted synthetic attack traffic / A. Ferriyan et al. Applied sciences. 2021. Vol. 11, no. 17. P. 7868. URL: https://doi.org/10.3390/app11177868 (date of access: 21.02.2026).

Survey of intrusion detection systems: techniques, datasets and challenges / A. Khraisat et al. Cybersecurity. 2019. Vol. 2, no. 1. URL: https://doi.org/10.1186/s42400-019-0038-7 (date of access: 21.02.2026).

Goldschmidt P., Chudá D. Network intrusion datasets: a survey, limitations, and rec-ommendations. Computers & security. 2025. P. 104510

URL: https://doi.org/10.1016/j.cose.2025.104510 (date of access: 21.02.2026).

Himabindu C. The challenges of effectively anonymizing network data. International jou-rnal of pharmacology and pharmaceutical technology. 2013. P. 15–22.

URL: https://doi.org/10.47893/ijppt.2013.1003 (date of access: 21.02.2026).

Kim J., Sim C., Choi J. Generating labeled flow data from MAWILab traces for network intrusion detection. Proceedings of the ACM workshop on systems and network telemetry and analytics. Phoenix, AZ, USA, 2019. P. 45–48.

Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS eva-luation / J. Song et al. The first workshop, Salzburg, Austria, 10 April 2011. New York, New York, USA, 2011. URL: https://doi.org/10.1145/1978672.1978676 (date of access: 21.02.2026).

Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection eva-luation / R. P. Lippmann et al. DARPA information survivability conference and exposition. DISCEX'00, Hilton Head, SC, USA. URL: https://doi.org/10.1109/discex.2000.821506 (date of access: 21.02.2026).

A detailed analysis of the KDD CUP 99 data set / M. Tavallaee et al. 2009 IEEE sym-posium on computational intelligence for security and defense applications (CISDA), Ottawa, ON, Canada, 8–10 July 2009. 2009. URL: https://doi.org/10.1109/cisda.2009.5356528 (date of access: 21.02.2026).

Toward developing a systematic approach to generate benchmark datasets for intrusion detection / A. Shiravi et al. Computers & security. 2012. Vol. 31, no. 3. P. 357–374. URL: https://doi.org/10.1016/j.cose.2011.12.012 (date of access: 21.02.2026).

Sharafaldin I., Habibi Lashkari A., Ghorbani A. A. Toward generating a new intrusion det-ection dataset and intrusion traffic characterization. 4th international conference on inf-ormation systems security and privacy, Funchal, Madeira, Portugal, 22–24 January 2018. 2018. URL: https://doi.org/10.5220/0006639801080116 (date of access: 21.02.2026).

Moustafa N., Slay J. UNSW-NB15: a comprehensive data set for network intrusion det-ection systems (UNSW-NB15 network data set). 2015 military communications and inf-ormation systems conference (milcis), Canberra, Australia, 10–12 November 2015. 2015. URL: https://doi.org/10.1109/milcis.2015.7348942 (date of access: 21.02.2026).

UGR‘16: A new dataset for the evaluation of cyclostationarity-based network IDSs / G. Maciá-Fernández et al. Computers & security. 2018. Vol. 73. P. 411–424. URL: https://doi.org/10.1016/j.cose.2017.11.004 (date of access: 21.02.2026).

Flow-based network traffic generation using Generative Adversarial Networks / M. Ring et al. Computers & security. 2019. Vol. 82. P. 156–172.

URL: https://doi.org/10.1016/j.cose.2018.12.012 (date of access: 21.02.2026).

GitHub - ahlashkari/cicflowmeter: cicflowmeter-v4.0 (formerly known as iscxflowmeter) is an ethernet traffic bi-flow generator and analyzer for anomaly detection that has been used in many cybersecurity datsets such as android adware-general malware dataset (CIC-AAGM2017), IPS/IDS dataset (CICIDS2017), android malware dataset (cicandmal2017) and distributed denial of service (cicddos2019). GitHub.

URL: https://github.com/ahlashkari/CICFlowMeter (date of access: 21.02.2026).

The zeek network security monitor. Zeek. URL: https://zeek.org (date of access: 21.02.2026).

Simpleweb/University of twente traffic traces data repository / R. R. R. Barbosa et al. CTIT technical report series. 2010.

URL: https://api.semanticscholar.org/CorpusID:13251179.

Welcome to SIPp. Welcome to SIPp. URL: https://sipp.sourceforge.net/ (date of access: 21.02.2026).

CUPID: A labeled dataset with Pentesting for evaluation of network intrusion detection / H. Lawrence et al. Journal of systems architecture. 2022. P. 102621.

URL: https://doi.org/10.1016/j.sysarc.2022.102621 (date of access: 21.02.2026).

Home | TCPDUMP & LIBPCAP. Home | TCPDUMP & LIBPCAP.

URL: https://www.tcpdump.org/ (date of access: 21.02.2026).

Engelen G., Rimmer V., Joosen W. Troubleshooting an intrusion detection dataset: the CICIDS2017 case study. 2021 IEEE security and privacy workshops (SPW), San Francisco, CA, USA, 27 May 2021. 2021. URL: https://doi.org/10.1109/spw53761.2021.00009 (date of access: 21.02.2026).

Layeghy S., Gallagher M., Portmann M. Benchmarking the benchmark – Comparing syn-thetic and real-world Network IDS datasets. Journal of information security and applications. 2024. Vol. 80. P. 103689. URL: https://doi.org/10.1016/j.jisa.2023.103689 (date of access: 21.02.2026).

Soft Release: lightSPD, the new rules package for Snort 3. Snort Blog. URL: https://blog.snort.org/2020/12/soft-release-lightspd-new-rules-package.html (date of access: 21.02.2026).

Snort 3 reference manual. GitHub.

URL: https://github.com/snort3/snort3/releases/download/3.10.2.0/snort_reference.html (date of access: 21.02.2026).

GitHub - 0xinfection/siptorch: A "SIP torture" (RFC 4475) testing framework. GitHub. URL: https://github.com/0xInfection/SIPTorch (date of access: 21.02.2026).

GitHub - andrius/asterisk: asterisk PBX in docker – smallest asterisk ever!. GitHub. URL: https://github.com/andrius/asterisk (date of access: 21.02.2026).

Comparing master...shuffle VytalyGorbatov/SIPTorch. GitHub. URL: htt-ps://github.com/VytalyGorbatov/SIPTorch/compare/master...VytalyGorbatov:SIPTorch:shuffle (date of access: 23.02.2026).

saghul/sipp-scenarios: SIPp scenarios I use for testing SIP stuff. GitHub. URL: htt-ps://github.com/saghul/sipp-scenarios (date of access: 23.02.2026).

Comparing master...mutability VytalyGorbatov/sipp-scenarios. GitHub. URL: htt-ps://github.com/VytalyGorbatov/sipp-scenarios/compare/master...VytalyGorbatov:sipp-scenarios:mutability (date of access: 23.02.2026).

Horbatov, V. S., Zhurba, A. O. (2025). Network intrusion classification method based on the rule structure of intrusion detection systems. Information Technologies and Automation – 2025: Proceedings of the XVIII International Scientific and Practical Conference, Odesa, October 30–31, 2025. Odesa: ONT University Publishing House, pp. 261–263.

GitHub - VytalyGorbatov/sip-dataset. GitHub.

URL: https://github.com/VytalyGorbatov/sip-dataset (date of access: 23.02.2026).

GitHub - VytalyGorbatov/sip-lab. GitHub. URL: https://github.com/VytalyGorbatov/sip-lab (date of access: 23.02.2026).

Published

2026-04-30