AI-Driven Anomaly Detection in Critical Infrastructure: Evaluating Emerging Frameworks and Predictive Models

Ускенбаева Раиса Кабиевна

Introduction

Critical infrastructure, encompassing electrical grids, water treatment systems, industrial control networks, financial clearing systems, and national telecommunications backbones, constitutes the operational foundation of modern statehood. The digitization of these environments, accelerated through Industry 4.0 integration and the proliferation of Internet of Things devices in industrial contexts, has simultaneously increased operational efficiency and expanded the attack surface available to adversarial actors. Incidents such as the 2021 Colonial Pipeline ransomware attack and the 2015 Ukrainian power grid intrusion demonstrated that disruption of cyber-physical systems carries consequences far exceeding ordinary data breaches, including cascading failures across interdependent sectors (Lee et al., 2016).

Traditional intrusion detection approaches rely on predefined rule sets and signature libraries that require continuous manual curation and exhibit near-zero generalization to novel attack patterns. In environments characterized by high-volume, heterogeneous network telemetry such as SCADA and industrial IoT deployments, these approaches produce unacceptable rates of both false negatives and false positives, the latter of which overwhelm security operations centers with operationally irrelevant alerts. The structural limitations of perimeter-centric defense have been extensively documented, and the consensus within the academic and practitioner communities has shifted toward behavioral, data-driven detection as the viable path forward. Artificial intelligence, and deep learning in particular, offers a fundamentally different detection paradigm. Rather than matching against known signatures, learned models characterize normal system behavior and identify deviations from that baseline in near real time. This capacity for behavioral profiling makes AI-driven anomaly detection especially suited to critical infrastructure environments, where the operational profile of legitimate traffic is relatively stable but the nature of adversarial techniques evolves continuously. This article evaluates the principal architectural approaches that have emerged in the research literature over the past several years, assesses their practical applicability within constrained industrial environments, and situates recent contributions within the broader trajectory of the field.

Methods

The analysis is structured as a critical, integrative review of the academic literature published between 2017 and 2024, supplemented by examination of select technical monographs and applied frameworks developed within the cybersecurity research community. Source selection prioritized peer-reviewed publications in IEEE, ACM, Elsevier, and Springer venues, with preference given to works reporting empirical results on publicly available benchmark datasets, principally CICIDS2017 and UNSW-NB15, that permit cross-study comparison (Sharafaldin et al., 2018).

In organizing the comparative analysis, a functional taxonomy was applied that distinguishes frameworks by their primary architectural component, whether recurrent, graph-based, attention-based, or ensemble, their supervision paradigm, and their intended deployment context. This taxonomy enables a structured assessment of trade-offs across dimensions of detection accuracy, computational tractability, interpretability, and adaptability to adversarial concept drift. Particular attention is paid to frameworks that incorporate predictive rather than purely reactive detection logic, that is, models designed not merely to classify observed traffic as malicious or benign but to anticipate attack trajectories based on precursor behavioral patterns. This distinction is substantively important for critical infrastructure, where the interval between initial compromise and operational impact may be measured in minutes, and early warning of sufficient specificity determines whether a timely response is feasible (Stouffer et al., 2011).

Results

Long Short-Term Memory networks, formalized by Hochreiter and Schmidhuber (1997), have become one of the most widely deployed deep learning architectures in network intrusion detection, owing to their native capacity to model sequential dependencies in time-series data. Network traffic at both the packet and flow levels exhibits pronounced temporal structure: attack campaigns unfold across time, and the statistical relationship between successive observations encodes information that instantaneous feature representations cannot capture. LSTM-based intrusion detection systems have demonstrated strong performance on the CICIDS2017 benchmark, with several implementations reporting accuracy exceeding 98 % on binary classification tasks (Yin et al., 2017).

A representative line of work combines LSTM encoders with federated learning frameworks, enabling distributed training across geographically separated sensor nodes without centralizing raw traffic data, a design property of particular relevance to national infrastructure contexts where data sovereignty constraints preclude centralized aggregation (Zhang & Li, 2022). Normalization procedures applied at the IP-flow level, including Min-Max scaling across multivariate telemetry features, have been shown to improve cross-environment portability of trained models, a persistent challenge when systems built on synthetic benchmarks are transferred to production operational technology environments (Avkhadiev, 2025a; Zhang & Li, 2022). In examining this portability problem, Avkhadiev (2025a) identifies adaptive threshold calibration during deployment as a critical factor that benchmark-oriented evaluations systematically underreport, a gap with direct consequences for how detection performance figures should be interpreted in infrastructure security planning.

The known limitation of pure LSTM architectures in this domain is their tendency to model temporal patterns at the individual-flow level while remaining insensitive to structural relationships across simultaneously active connections. This gap becomes consequential when detecting coordinated multi-vector attacks or lateral movement campaigns that exploit topological properties of network graphs.

Graph Neural Networks address precisely this limitation. By representing network activity as a dynamic graph, where nodes correspond to hosts or services and edges encode communication relationships, GNN-based intrusion detection systems learn structural attack signatures that are invisible to flow-level sequential models (Bilot et al., 2023). Distributed Denial of Service attacks manifest as characteristic fan-in or fan-out topologies identifiable at the graph level even when individual flow statistics remain within normal bounds. Advanced Persistent Threat campaigns, which deliberately suppress per-connection anomaly signals, nonetheless produce distinguishable graph-structural footprints as attackers traverse lateral paths through a target network.

Xie et al. (2023) demonstrated that a hybrid GCN-LSTM architecture applied to Industrial Control System traffic achieves superior detection performance relative to either component alone, with the GCN module capturing spatial relational features and the LSTM module capturing their temporal evolution across observation windows. Protocol-specific characteristics of ICS environments, including the constrained and deterministic communication patterns of Modbus and DNP3 traffic, create favorable conditions for graph-based behavioral modeling, since deviations from expected relational structures are particularly salient against a stable operational baseline (Avkhadiev, 2025a; Xie et al., 2023).

The attention mechanism, originally developed for natural language processing, has been productively adapted to network security through Transformer-based anomaly detection models. Attention-based architectures offer two properties that are operationally attractive in this context: the ability to model long-range dependencies across extended observation windows without the gradient degradation that affects deep recurrent networks, and the production of attention weight distributions that can serve as a basis for explainability. This second property is of growing practical importance as regulatory frameworks increasingly demand accountability in automated decision systems (Arrieta et al., 2020).

A CNN-LSTM-Transformer hybrid proposed by Rahimian et al. (2024) achieved state-of-the-art performance on the UNSW-NB15 dataset, combining convolutional layers for local feature extraction, LSTM layers for sequential modeling, and Transformer layers for global context integration. The architectural case for such compositional designs rather than single-paradigm models is examined in detail by Avkhadiev (2025a), who argues that the complementary nature of convolutional, recurrent, and attention-based components is particularly consequential in heterogeneous industrial environments where threat signatures operate across multiple temporal and topological scales simultaneously.

The challenge of real-time threat classification in cloud-native environments adds a further dimension that static benchmark evaluations do not adequately address. In elastic, dynamically reconfigured infrastructure, the behavioral baseline is non-stationary and must be estimated from streaming data rather than a stable historical corpus. Adaptive threshold mechanisms and online learning components have been proposed as responses to this form of concept drift, with integration into Security Orchestration, Automation and Response workflows identified as a prerequisite for operational utility (Avkhadiev, 2025b; Nguyen et al., 2021).

The translation of deep learning anomaly detection outputs into actionable security operations requires that detection decisions be accompanied by interpretable justifications, both to enable analyst triage and to satisfy audit and compliance requirements under frameworks such as NIST SP 800–53 and the NIS2 Directive (Rose et al., 2020). Explainable AI techniques, particularly SHAP and LIME, have been applied to post-hoc explanation of intrusion detection decisions with encouraging results, though the computational overhead of SHAP analysis at production traffic scales remains a constraint on real-time deployments (Marino et al., 2022).

Isolated detection models that produce alerts without downstream workflow integration reproduce the false-positive fatigue problem in a new form. API-level interoperability with SIEM platforms and automated playbook triggering through SOAR connectors address this gap at the architectural level, transforming detection outputs into structured incident response actions rather than raw alerts requiring manual interpretation (Avkhadiev, 2025b; Marino et al., 2022).

Class imbalance in operational network traffic presents a structural challenge for supervised detection models: benign samples typically constitute the overwhelming majority of observed flows, and models trained on such distributions tend to optimize for overall accuracy at the expense of sensitivity to minority attack classes. Resampling strategies and cost-sensitive learning address part of this problem during training, but threshold selection at inference time remains an independent and consequential variable. Avkhadiev (2025b) demonstrates that post-training threshold calibration, applied separately for each traffic class rather than globally, yields measurable improvements in recall on low-frequency attack categories while maintaining false positive rates within operationally acceptable bounds. The result is a detection profile that can be adjusted to reflect the specific risk priorities of a given deployment rather than the aggregate performance characteristics of the training corpus.

Discussion

The survey of current frameworks reveals a field in productive transition. The dominant architectural trend toward hybrid models combining temporal, structural, and attention-based feature extraction is well-grounded in the complementary nature of information sources available in industrial network environments. Single-architecture systems consistently underperform relative to compositional approaches on complex, multi-class attack scenarios, and this performance gap widens as threat sophistication increases.

Several structural challenges remain inadequately resolved in the current generation of frameworks. The first is the generalization gap between benchmark and production performance. Virtually all published results are reported on a small number of standardized datasets that do not fully represent the protocol diversity, traffic volumes, or adversarial dynamics of real industrial deployments. The degree to which accuracy figures obtained on these benchmarks translate to operational environments is an empirical question that the literature has not systematically addressed.

The second challenge is adversarial robustness. Machine learning models are susceptible to evasion attacks in which adversaries deliberately craft traffic to resemble benign behavior at the feature level while preserving attack functionality at the semantic level. As detection models are trained on increasingly large corpora, adversaries develop correspondingly tailored evasion strategies, creating a recursive dynamic that static benchmark evaluations cannot capture. The scope of this problem extends to data poisoning and model inversion attacks directed at detection systems themselves, threat vectors that, as Avkhadiev (2025b) notes in the context of cloud-deployed detection pipelines, require security measures applied to the AI layer rather than solely to the infrastructure it monitors.

Distributional drift arising from legitimate operational change, including network reconfiguration, equipment replacement, or shifts in application usage patterns, poses a robustness challenge distinct from adversarial evasion but with comparable consequences for detection reliability. A model trained on a stable historical baseline may exhibit progressive degradation in precision as the statistical properties of normal traffic evolve, producing alert volumes that exceed analyst capacity and precipitating the suppression behavior documented in overloaded security operations contexts. The continuous recalibration mechanism proposed by Avkhadiev (2025a) addresses this by monitoring incoming traffic distributions against the trained baseline and initiating selective model updates when divergence exceeds a specified threshold. Standard evaluation protocols, which assess model performance against fixed held-out test sets, do not capture this form of temporal degradation, and the discrepancy between static benchmark results and sustained operational accuracy warrants more systematic attention in the methodological literature.

The third challenge concerns governance constraints on federated or distributed training. Mechanisms for training shared detection models across infrastructure operators without disclosing sensitive operational data to any central party are technically feasible through federated learning and differential privacy, but their deployment at national scale involves legal and trust architecture questions that extend well beyond the technical domain (Nguyen et al., 2021). The integration of Zero Trust principles, specifically continuous authentication, least-privilege access enforcement, and microsegmentation, as a complementary architectural layer reduces the effective attack surface and improves the signal-to-noise ratio of behavioral baselines, thereby strengthening detection performance indirectly (Avkhadiev, 2025c; Rose et al., 2020).

Conclusion

The review of current AI-driven anomaly detection frameworks confirms that deep learning, particularly hybrid architectures combining recurrent, graph-based, and attention-based components, has reached a level of technical maturity sufficient to support deployment in high-stakes environments, provided that the operational and governance preconditions for such deployment are adequately addressed. The persistent challenges of benchmark generalization, adversarial robustness, explainability at scale, and federated governance constitute the productive frontier for the next generation of research, with implications extending to national infrastructure resilience policy as well as technical practice.

References:

1. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

2. Avkhadiev, D. I. (2025a). Neural network algorithms for predictive detection of cyber threats in critical infrastructure. In Technical Sciences: Problems and Solutions: Proceedings of the CI International Scientific and Practical Conference (No. 10(95)). Internauka.

3. Avkhadiev, D. I. (2025b). Intelligent methods of early response to attacks in cloud systems. Internauka, 28(392)

4. Avkhadiev, D. I. (2025c). Zero trust architecture: Principles for the future. Lambert Academic Publishing. ISBN 978–620–9-02385–9.

5. Bilot, T., El Madhoun, N., Al Agha, K., & Zouaoui, A. (2023). Graph neural networks for intrusion detection: A survey. IEEE Access, 11, 49114–49139. https://doi.org/10.1109/ACCESS.2023.3275789

6. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

7. Lee, R. M., Assante, M. J., & Conway, T. (2016). Analysis of the cyber attack on the Ukrainian power grid. SANS ICS and E-ISAC.

8. Marino, D. L., Wickramasinghe, C. S., & Manic, M. (2022). An adversarial approach for explainable AI in intrusion detection systems. IEEE Transactions on Network and Service Management, 19(2), 1293–1305. https://doi.org/10.1109/TNSM.2022.3147028

9. Nguyen, T. D., Marchal, S., Miettinen, M., Fereidooni, H., Asokan, N., & Sadeghi, A.-R. (2021). DIoT: A federated self-learning anomaly detection system for IoT. IEEE Transactions on Information Forensics and Security, 16, 1413–1428. https://doi.org/10.1109/TIFS.2020.3017054

10. Rose, S., Borchert, O., Mitchell, S., & Connelly, S. (2020). Zero trust architecture (NIST Special Publication 800–207). National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800–207

11. Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of ICISSP, 108–116. https://doi.org/10.5220/0006639801080116

12. Stouffer, K., Falco, J., & Scarfone, K. (2011). Guide to industrial control systems (ICS) security (NIST Special Publication 800–82). National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800–82

13. Xie, H., Xu, Y., & Zheng, Y. (2023). A hybrid GCN-LSTM model for cyber attack detection in industrial control systems. Computers & Security, 131, 102890. https://doi.org/10.1016/j.cose.2023.102890

14. Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access, 5, 21954–21961. https://doi.org/10.1109/ACCESS.2017.2762418

15. Zhang, T., & Li, Y. (2022). Cybersecurity threat detection based on federated learning and LSTM. Future Generation Computer Systems, 132, 145–156. https://doi.org/10.1016/j.future.2022.02.014

Молодой учёный

AI-Driven Anomaly Detection in Critical Infrastructure: Evaluating Emerging Frameworks and Predictive Models

AI-Driven Anomaly Detection in Critical Infrastructure: Evaluating Emerging Frameworks and Predictive Models

Молодой учёный