Political debates have been used for years in political and social studies on languages and their cultures. In this paper, we release a preliminary version of the Italian Parliamentary Corpus, a dataset containing 1.2 billion words that includes the political debates in the Italian Parliament from 1848 to 2018. The data has been collected applying an Optical Character Recognition (OCR) software to the original documents, available in PDF format on the websites of Camera dei Deputati and Senato della Repubblica
A preliminary release of the Italian Parliamentary Corpus / Frasnelli, Valentino; Palmero Aprosio, Alessio. - 3596:(2023). (Intervento presentato al convegno 9th Italian Conference on Computational Linguistics, CLiC-it 2023 tenutosi a Venezia, Italia nel 30th November - 2nd December 2023).
A preliminary release of the Italian Parliamentary Corpus
Palmero Aprosio, AlessioCo-primo
2023-01-01
Abstract
Political debates have been used for years in political and social studies on languages and their cultures. In this paper, we release a preliminary version of the Italian Parliamentary Corpus, a dataset containing 1.2 billion words that includes the political debates in the Italian Parliament from 1848 to 2018. The data has been collected applying an Optical Character Recognition (OCR) software to the original documents, available in PDF format on the websites of Camera dei Deputati and Senato della RepubblicaFile | Dimensione | Formato | |
---|---|---|---|
short11.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.06 MB
Formato
Adobe PDF
|
1.06 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione