Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Estimating in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting machine learning models and information about Wikipedia’s page views of a selected group of articles to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel language-agnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model can reach state-of-the-art results by comparing it with previous solutions.

A general method for estimating the prevalence of influenza-like-symptoms with Wikipedia data / De Toni, Giovanni; Consonni, Cristian; Montresor, Alberto. - In: PLOS ONE. - ISSN 1932-6203. - ELETTRONICO. - 16:28(2021). [10.1371/journal.pone.0256858]

A general method for estimating the prevalence of influenza-like-symptoms with Wikipedia data

Giovanni De Toni;Cristian Consonni;Alberto Montresor.
2021-01-01

Abstract

Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Estimating in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting machine learning models and information about Wikipedia’s page views of a selected group of articles to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel language-agnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model can reach state-of-the-art results by comparing it with previous solutions.
2021
28
De Toni, Giovanni; Consonni, Cristian; Montresor, Alberto
A general method for estimating the prevalence of influenza-like-symptoms with Wikipedia data / De Toni, Giovanni; Consonni, Cristian; Montresor, Alberto. - In: PLOS ONE. - ISSN 1932-6203. - ELETTRONICO. - 16:28(2021). [10.1371/journal.pone.0256858]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/325287
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
  • OpenAlex ND
social impact