This work introduces a novel, extensive annotated corpus for multi-label legislative text classification in Italian, based on legal acts from the Gazzetta Ufficiale, the official source of legislative information of the Italian state. The annotated dataset, which we released to the community, comprises over 363,000 titles of legislative acts, spanning over 30 years from 1988 until 2022. Moreover, we evaluate four models for text classification on the dataset, demonstrating how using only the acts’ titles can achieve top-level classification performance, with a micro F1-score of 0.87. Also, our analysis shows how Italian domain-adapted legal models do not outperform general-purpose models on the task. Models’ performance can be checked by users via a demonstrator system provided in support of this work.

Italian Legislative Text Classification for Gazzetta Ufficiale / Rovera, Marco; Palmero Aprosio, Alessio; Greco, Francesco; Lucchese, Mariano; Tonelli, Sara; Antetomaso, Antonio. - (2023), pp. 44-50. (Intervento presentato al convegno NLLP 2023 - Natural Legal Language Processing Workshop 2023 tenutosi a Singapore nel December 7, 2023) [10.18653/v1/2023.nllp-1.6].

Italian Legislative Text Classification for Gazzetta Ufficiale

Palmero Aprosio, Alessio;Tonelli, Sara;
2023-01-01

Abstract

This work introduces a novel, extensive annotated corpus for multi-label legislative text classification in Italian, based on legal acts from the Gazzetta Ufficiale, the official source of legislative information of the Italian state. The annotated dataset, which we released to the community, comprises over 363,000 titles of legislative acts, spanning over 30 years from 1988 until 2022. Moreover, we evaluate four models for text classification on the dataset, demonstrating how using only the acts’ titles can achieve top-level classification performance, with a micro F1-score of 0.87. Also, our analysis shows how Italian domain-adapted legal models do not outperform general-purpose models on the task. Models’ performance can be checked by users via a demonstrator system provided in support of this work.
2023
Proceedings of the Natural Legal Language Processing Workshop 2023
Singapore
Association for Computational Linguistics
Rovera, Marco; Palmero Aprosio, Alessio; Greco, Francesco; Lucchese, Mariano; Tonelli, Sara; Antetomaso, Antonio
Italian Legislative Text Classification for Gazzetta Ufficiale / Rovera, Marco; Palmero Aprosio, Alessio; Greco, Francesco; Lucchese, Mariano; Tonelli, Sara; Antetomaso, Antonio. - (2023), pp. 44-50. (Intervento presentato al convegno NLLP 2023 - Natural Legal Language Processing Workshop 2023 tenutosi a Singapore nel December 7, 2023) [10.18653/v1/2023.nllp-1.6].
File in questo prodotto:
File Dimensione Formato  
2023.nllp-1.6.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 113.82 kB
Formato Adobe PDF
113.82 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/412714
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact