There's Something New about the Italian Parliament: the IPSA Corpus

Frasnelli, Valentino; Palmero Aprosio, Alessio

Parliamentary debates constitute a substantial and somewhat underutilized reservoir of publicly available written content. Despite their potential, the Italian parliamentary documents remain largely unexplored and most importantly inaccessible in their original paper-based form. In this paper we attempt to transform these valuable historical documents into IPSA, a digitally readable structured corpus containing speeches, reports of the Standing Committees, and law proposals spanning 175 years of Italian history, from the issuing of the Statuto Albertino in 1848, up to the present day. At first, the PDF documents, available on the official websites of Senato della Repubblica and Camera dei Deputati, the two chambers that form the Italian Parliament, are digitized using Optical Character Recognition (OCR) techniques. Then, the speeches are tagged with the corresponding speakers. The final dataset is released both in textual and structured format.

There's Something New about the Italian Parliament: the IPSA Corpus / Frasnelli, V., Palmero Aprosio, A.. - ELETTRONICO. - (2024), pp. 16037-16046. (2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Torino, Itali 2024).

There's Something New about the Italian Parliament: the IPSA Corpus

Frasnelli, Valentino;Palmero Aprosio, Alessio

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
			
	Luogo di edizione (Place of publication)
	
				Paris, France
			
	Casa editrice (Publisher)
	
				European Language Resources Association (ELRA)
			
	ISBN
	
				978-2-493814-10-4
			
	Settori scientifico-disciplinari (validi dal 09/05/2024) - Reference SSD (valid from 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85195967453
			
	Tutti gli autori
	
						Frasnelli, Valentino; Palmero Aprosio, Alessio
					
	Citazione
	
				There's Something New about the Italian Parliament: the IPSA Corpus / Frasnelli, V., Palmero Aprosio, A.. - ELETTRONICO. - (2024), pp. 16037-16046. (2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Torino, Itali 2024).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2024.lrec-main.1394.pdf accesso aperto Descrizione: Paper PDF Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.29 MB Formato Adobe PDF Visualizza/Apri	3.29 MB	Adobe PDF	Visualizza/Apri