Wikipedia is the largest collection of encyclopedic data ever written in the history of humanity. Thanks to its coverage and its availability in machine-readable format, it has become a primary resource for largescale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present. We model this as a sequence classification problem, and propose a supervised setting, in which the training data are acquired automatically. Besides, we show that six simple features extracted only from the section titles are very informative and yield good results well above a strong baseline.

Recognizing Biographical Sections in Wikipedia / Palmero Aprosio, Alessio; Tonelli, Sara. - ELETTRONICO. - (2015), pp. 811-816. ( Empirical Methods in Natural Language Processing, EMNLP 2015 Lisbon, Portugal September 17-21, 2015).

Recognizing Biographical Sections in Wikipedia

Palmero Aprosio, Alessio;Tonelli, Sara
2015-01-01

Abstract

Wikipedia is the largest collection of encyclopedic data ever written in the history of humanity. Thanks to its coverage and its availability in machine-readable format, it has become a primary resource for largescale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present. We model this as a sequence classification problem, and propose a supervised setting, in which the training data are acquired automatically. Besides, we show that six simple features extracted only from the section titles are very informative and yield good results well above a strong baseline.
2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Lisbon, Portugal
The Association for Computational Linguistics
978-1-941643-32-7
Palmero Aprosio, Alessio; Tonelli, Sara
Recognizing Biographical Sections in Wikipedia / Palmero Aprosio, Alessio; Tonelli, Sara. - ELETTRONICO. - (2015), pp. 811-816. ( Empirical Methods in Natural Language Processing, EMNLP 2015 Lisbon, Portugal September 17-21, 2015).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/454154
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 6
social impact