Anomaly detection consists in automatically detecting the most unusual elements in a data set. Anomaly detection applications emerge in domains such as computer security, system monitoring, fault detection, and wireless sensor networks. The strategic importance of detecting anomalies in these domains makes anomaly detection a critical data analysis task. Moreover, the contextual nature of anomalies, among other issues, makes anomaly detection a particularly challenging problem. Anomaly detection has received significant research attention in the last two decades. Much effort has been invested in the development of novel algorithms for anomaly detection. However, several open challenges still exist in the field.This thesis presents our contributions toward solving these challenges. These contributions include: a methodological survey of the recent literature, a novel benchmarking framework for anomaly detection algorithms, an approach for scaling anomaly detection techniques to massive data sets, and a novel anomaly detection algorithm inspired by the law of universal gravitation. Our methodological survey highlights open challenges in the field, and it provides some motivation for our other contributions. Our benchmarking framework, named BAD, tackles the problem of reliably assess the accuracy of unsupervised anomaly detection algorithms. BAD leverages parallel and distributed computing to enable massive comparison studies and hyperparameter tuning tasks. The challenge of scaling unsupervised anomaly detection techniques to massive data sets is well-known in the literature. In this context, our contributions are twofold: we investigate the trade-offs between a single-threaded implementation and a distributed approach considering price-performance metrics, and we propose a scalable approach for anomaly detection algorithms to arbitrary data volumes. Our results show that, when high scalability is required, our approach can handle arbitrarily large data sets without significantly compromising detection accuracy. We conclude our contributions by proposing a novel algorithm for anomaly detection, named Gravity. Gravity identifies anomalies by considering the attraction forces among massive data elements. Our evaluation shows that Gravity is competitive with other popular anomaly detection techniques on several benchmark data sets. Additionally, the properties of Gravity makes it preferable in cases where hyperparameter tuning is challenging or unfeasible.

Modern Anomaly Detection: Benchmarking, Scalability and a Novel Approach / Pasupathipillai, Sivam. - (2020 Nov 27), pp. 1-117. [10.15168/11572_281952]

Modern Anomaly Detection: Benchmarking, Scalability and a Novel Approach

Pasupathipillai, Sivam
2020-11-27

Abstract

Anomaly detection consists in automatically detecting the most unusual elements in a data set. Anomaly detection applications emerge in domains such as computer security, system monitoring, fault detection, and wireless sensor networks. The strategic importance of detecting anomalies in these domains makes anomaly detection a critical data analysis task. Moreover, the contextual nature of anomalies, among other issues, makes anomaly detection a particularly challenging problem. Anomaly detection has received significant research attention in the last two decades. Much effort has been invested in the development of novel algorithms for anomaly detection. However, several open challenges still exist in the field.This thesis presents our contributions toward solving these challenges. These contributions include: a methodological survey of the recent literature, a novel benchmarking framework for anomaly detection algorithms, an approach for scaling anomaly detection techniques to massive data sets, and a novel anomaly detection algorithm inspired by the law of universal gravitation. Our methodological survey highlights open challenges in the field, and it provides some motivation for our other contributions. Our benchmarking framework, named BAD, tackles the problem of reliably assess the accuracy of unsupervised anomaly detection algorithms. BAD leverages parallel and distributed computing to enable massive comparison studies and hyperparameter tuning tasks. The challenge of scaling unsupervised anomaly detection techniques to massive data sets is well-known in the literature. In this context, our contributions are twofold: we investigate the trade-offs between a single-threaded implementation and a distributed approach considering price-performance metrics, and we propose a scalable approach for anomaly detection algorithms to arbitrary data volumes. Our results show that, when high scalability is required, our approach can handle arbitrarily large data sets without significantly compromising detection accuracy. We conclude our contributions by proposing a novel algorithm for anomaly detection, named Gravity. Gravity identifies anomalies by considering the attraction forces among massive data elements. Our evaluation shows that Gravity is competitive with other popular anomaly detection techniques on several benchmark data sets. Additionally, the properties of Gravity makes it preferable in cases where hyperparameter tuning is challenging or unfeasible.
27-nov-2020
XXXII
2018-2019
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Della Valle, Emanuele
Velegrakis, Ioannis
no
Inglese
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
File in questo prodotto:
File Dimensione Formato  
thesis_final.pdf

Open Access dal 02/11/2021

Descrizione: Documento principale
Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Creative commons
Dimensione 3.4 MB
Formato Adobe PDF
3.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/281952
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact