Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the source code used for the experiments.

Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the source code used for the experiments.

Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws / Bocchi, Lorenzo; Palmero Aprosio, Alessio. - ELETTRONICO. - 3878:10(2024), pp. 1-7. (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel December 4-6, 2024).

Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws

Bocchi Lorenzo;Palmero Aprosio Alessio
2024-01-01

Abstract

Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the source code used for the experiments.
2024
CEUR Workshop Proceedings
Aachen, Germany
CEUR-WS
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
Bocchi, Lorenzo; Palmero Aprosio, Alessio
Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws / Bocchi, Lorenzo; Palmero Aprosio, Alessio. - ELETTRONICO. - 3878:10(2024), pp. 1-7. (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel December 4-6, 2024).
File in questo prodotto:
File Dimensione Formato  
10_main_long.pdf

accesso aperto

Descrizione: Paper PDF
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/445253
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact