Clustering of high dimensional data streams is an impor- tant problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the dierent aspects of the problem. There exist methods for clustering over full dimensional streams and meth- ods for nding clusters in subspaces of high dimensional static data. Yet only a few approaches have been pro- posed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimen- sional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points ex- pire due to ageing. Our experimental results illustrate the eectiveness and the eciency of HDDStream and also demonstrate that it could serve as a trigger for de- tecting drastic changes in the underlying stream popu- lation, like bursts of network attacks.

Density-based Projected Clustering over High Dimensional Data Streams

Palpanas, Themistoklis;
2012-01-01

Abstract

Clustering of high dimensional data streams is an impor- tant problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the dierent aspects of the problem. There exist methods for clustering over full dimensional streams and meth- ods for nding clusters in subspaces of high dimensional static data. Yet only a few approaches have been pro- posed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimen- sional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points ex- pire due to ageing. Our experimental results illustrate the eectiveness and the eciency of HDDStream and also demonstrate that it could serve as a trigger for de- tecting drastic changes in the underlying stream popu- lation, like bursts of network attacks.
2012
Proceedings of the 11th Hellenic Data Management Symposium
AA. VV.
Philadelphia
ACM
I., Ntoutsi; A., Zimek; Palpanas, Themistoklis; P., Kroger; H. P., Kriegel
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/91636
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 90
  • ???jsp.display-item.citation.isi??? ND
social impact