With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is no common and pervasive notion of data staleness, as well as methods for its measurement in a wide scope of applications. Our work provides a definition of a data-driven notion of staleness for information systems with frequently updatable data. For such a data, we demonstrate an efficient exponential smoothing method of staleness measurement, compared to naïve approaches, using the same limited amount of memory, based on averaging of frequency of updates. We present experimental results of staleness measurement algorithms that we run on history of updates of articles from Wikipedia.

Defining and Measuring Data-Driven Quality Dimension of Staleness / Chayka, Oleksiy; Palpanas, Themis; Bouquet, Paolo. - ELETTRONICO. - (2012).

Defining and Measuring Data-Driven Quality Dimension of Staleness

Chayka, Oleksiy
Primo
;
Palpanas, Themis
Penultimo
;
Bouquet, Paolo
Ultimo
2012-01-01

Abstract

With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is no common and pervasive notion of data staleness, as well as methods for its measurement in a wide scope of applications. Our work provides a definition of a data-driven notion of staleness for information systems with frequently updatable data. For such a data, we demonstrate an efficient exponential smoothing method of staleness measurement, compared to naïve approaches, using the same limited amount of memory, based on averaging of frequency of updates. We present experimental results of staleness measurement algorithms that we run on history of updates of articles from Wikipedia.
2012
Trento
Università degli Studi di Trento, Dipartimento di Ingegneria e Scienza dell'Informazione
Defining and Measuring Data-Driven Quality Dimension of Staleness / Chayka, Oleksiy; Palpanas, Themis; Bouquet, Paolo. - ELETTRONICO. - (2012).
Chayka, Oleksiy; Palpanas, Themis; Bouquet, Paolo
File in questo prodotto:
File Dimensione Formato  
Tech_Report__Chaika_v.2.0.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 882.48 kB
Formato Adobe PDF
882.48 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/359404
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact