With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is no common and pervasive notion of data staleness, as well as methods for its measurement in a wide scope of applications. Our work provides a definition of a data-driven notion of staleness for information systems with frequently updatable data. For such a data, we demonstrate an efficient exponential smoothing method of staleness measurement, compared to naïve approaches, using the same limited amount of memory, based on averaging of frequency of updates. We present experimental results of staleness measurement algorithms that we run on history of updates of articles from Wikipedia.
Defining and Measuring Data-Driven Quality Dimension of Staleness / Chayka, Oleksiy; Palpanas, Themis; Bouquet, Paolo. - ELETTRONICO. - (2012).
Defining and Measuring Data-Driven Quality Dimension of Staleness
Chayka, OleksiyPrimo
;Palpanas, ThemisPenultimo
;Bouquet, PaoloUltimo
2012-01-01
Abstract
With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is no common and pervasive notion of data staleness, as well as methods for its measurement in a wide scope of applications. Our work provides a definition of a data-driven notion of staleness for information systems with frequently updatable data. For such a data, we demonstrate an efficient exponential smoothing method of staleness measurement, compared to naïve approaches, using the same limited amount of memory, based on averaging of frequency of updates. We present experimental results of staleness measurement algorithms that we run on history of updates of articles from Wikipedia.File | Dimensione | Formato | |
---|---|---|---|
Tech_Report__Chaika_v.2.0.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
882.48 kB
Formato
Adobe PDF
|
882.48 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione