There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of data series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million data series. In this paper, we describe iSAX 2.0 and its improvements, iSAX 2.0 Clustered and iSAX2+, three methods designed for indexing and mining truly massive collections of data series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a data series index. We show how our methods allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion data series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.

Beyond One Billion Time Series: Indexing and Mining Very Large Time Series Collections with iSAX2+.

Palpanas, Themistoklis;
2014

Abstract

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of data series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million data series. In this paper, we describe iSAX 2.0 and its improvements, iSAX 2.0 Clustered and iSAX2+, three methods designed for indexing and mining truly massive collections of data series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a data series index. We show how our methods allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion data series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.
1
A., Camerra; J., Shieh; Palpanas, Themistoklis; T., Rakthanmanon; E., Keogh
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/95220
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 79
  • ???jsp.display-item.citation.isi??? 53
social impact