Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws

Bocchi, Lorenzo; Palmero Aprosio, Alessio

Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the source code used for the experiments.

Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws / Bocchi, L., Palmero Aprosio, A.. - ELETTRONICO. - 3878:10(2024), pp. 1-7. (10th Italian Conference on Computational Linguistics, CLiC-it 2024 Pisa, Italy December 4-6, 2024).

Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws

Bocchi Lorenzo;Palmero Aprosio Alessio

2024-01-01

Abstract

Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA. In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models. We first collect data from the European Union, already tagged with the aforementioned taxonomy. Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text. Results show that the title and the context are both important, although the order of the text may not. Finally, we release on GitHub both the dataset and the source code used for the experiments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				CEUR Workshop Proceedings
			
	Luogo di edizione (Place of publication)
	
				Aachen, Germany
			
	Casa editrice (Publisher)
	
				CEUR-WS
			
	Settori scientifico-disciplinari (validi dal 09/05/2024) - Reference SSD (valid from 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85214429011
			
	Tutti gli autori
	
						Bocchi, Lorenzo; Palmero Aprosio, Alessio
					
	Citazione
	
				Title is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws / Bocchi, L., Palmero Aprosio, A.. - ELETTRONICO. - 3878:10(2024), pp. 1-7. (10th Italian Conference on Computational Linguistics, CLiC-it 2024 Pisa, Italy December 4-6, 2024).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
10_main_long.pdf accesso aperto Descrizione: Paper PDF Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri	1.02 MB	Adobe PDF	Visualizza/Apri