Modern network environments generate vast streams of complex and evolving traffic data, presenting significant challenges for accurate and real-time intrusion detection. Conventional deep learning approaches often fail to effectively capture the multi-scale temporal patterns and are frequently hampered by severe class imbalance and catastrophic forgetting when faced with non-stationary data streams. To overcome these limitations, this paper introduces a novel Multi-scale Attention Fusion (MAF) module, a general-purpose architectural enhancement for transformer-based models designed to achieve synergistic optimization across three critical dimensions: computational efficiency, continual learning adaptability, and multi-scale temporal perception. We present two instantiations of this approach: MEGA+MAF and FNet+MAF. These models synergistically combine short-term local context, long-range dependencies, and global sequence information through an adaptive, learnable gating mechanism. A comprehensive evaluation across four diverse benchmark datasets—ToN-IoT, X-IIoTID, CICEVS2024, and CSE-CIC-IDS2018—demonstrates state-of-the-art performance with balanced optimization across all three dimensions: (1) FNet+MAF achieved superior computational efficiency with up to 8.5 × lower memory and 5.3 × faster inference while maintaining high accuracy; (2) MEGA+MAF demonstrated exceptional continual learning capability, achieving 99.10% accuracy in dynamic streaming environments while effectively eliminating backward forgetting (0.00%) and minimizing forward forgetting (0.06%); and (3) Both models exhibited robust multi-scale perception, capturing threats across short-term bursts, mid-range sessions, and global traffic patterns with up to 99.97% F1-score. Our ablation study after 20 epochs of training identifies 80 tokens as optimal, achieving 85.20% accuracy with 145.1 samples/second throughput. Interpretability analyses further confirm that the models learn robust and semantically meaningful feature representations aligned with network security semantics. The proposed framework represents a significant advancement toward building adaptive, next-generation intrusion detection systems capable of evolving with emerging threats while maintaining operational efficiency in resource-constrained environments.

Multi-scale attention fusion for enhanced transformer models in intrusion detection systems / Saidane, Samia; Telch, Francesco; Shahin, Kussai; Granelli, Fabrizio. - In: COMPUTER NETWORKS. - ISSN 1389-1286. - 277:(2026), pp. 111985-111985. [10.1016/j.comnet.2025.111985]

Multi-scale attention fusion for enhanced transformer models in intrusion detection systems

Samia Saidane;Francesco Telch;Fabrizio Granelli
2026-01-01

Abstract

Modern network environments generate vast streams of complex and evolving traffic data, presenting significant challenges for accurate and real-time intrusion detection. Conventional deep learning approaches often fail to effectively capture the multi-scale temporal patterns and are frequently hampered by severe class imbalance and catastrophic forgetting when faced with non-stationary data streams. To overcome these limitations, this paper introduces a novel Multi-scale Attention Fusion (MAF) module, a general-purpose architectural enhancement for transformer-based models designed to achieve synergistic optimization across three critical dimensions: computational efficiency, continual learning adaptability, and multi-scale temporal perception. We present two instantiations of this approach: MEGA+MAF and FNet+MAF. These models synergistically combine short-term local context, long-range dependencies, and global sequence information through an adaptive, learnable gating mechanism. A comprehensive evaluation across four diverse benchmark datasets—ToN-IoT, X-IIoTID, CICEVS2024, and CSE-CIC-IDS2018—demonstrates state-of-the-art performance with balanced optimization across all three dimensions: (1) FNet+MAF achieved superior computational efficiency with up to 8.5 × lower memory and 5.3 × faster inference while maintaining high accuracy; (2) MEGA+MAF demonstrated exceptional continual learning capability, achieving 99.10% accuracy in dynamic streaming environments while effectively eliminating backward forgetting (0.00%) and minimizing forward forgetting (0.06%); and (3) Both models exhibited robust multi-scale perception, capturing threats across short-term bursts, mid-range sessions, and global traffic patterns with up to 99.97% F1-score. Our ablation study after 20 epochs of training identifies 80 tokens as optimal, achieving 85.20% accuracy with 145.1 samples/second throughput. Interpretability analyses further confirm that the models learn robust and semantically meaningful feature representations aligned with network security semantics. The proposed framework represents a significant advancement toward building adaptive, next-generation intrusion detection systems capable of evolving with emerging threats while maintaining operational efficiency in resource-constrained environments.
2026
Settore ING-INF/03 - Telecomunicazioni
Settore IINF-03/A - Telecomunicazioni
Saidane, Samia; Telch, Francesco; Shahin, Kussai; Granelli, Fabrizio
Multi-scale attention fusion for enhanced transformer models in intrusion detection systems / Saidane, Samia; Telch, Francesco; Shahin, Kussai; Granelli, Fabrizio. - In: COMPUTER NETWORKS. - ISSN 1389-1286. - 277:(2026), pp. 111985-111985. [10.1016/j.comnet.2025.111985]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/473730
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 1
social impact