The use of Machine Learning and Artificial Intelligence in the Public Administration (PA) has increased in the last years. In particular, recent guidelines proposed by various governments for the classification of documents released by the PA suggest to use the EuroVoc thesaurus. In this paper, we present KEVLAR, an all-in-one solution for performing the above-mentioned task on acts belonging to the Public Administration. First, we create a collection of 8 million documents in 24 languages, tagged with EuroVoc labels, taken from EUR-Lex, the web portal of the European Union legislation. Then, we train different pre-trained BERT-based models, comparing the performance of base models with domain-specific and multilingual ones. We release the corpus, the best-performing models, and a Docker image containing the source code of the trainer, the REST API, and the web interface. This image can be employed out-of-the-box for document classification.

The use of Machine Learning and Artificial Intelligence in the Public Administration (PA) has increased in the last years. In particular, recent guidelines proposed by various governments for the classification of documents released by the PA suggest to use the EuroVoc thesaurus. In this paper, we present KEVLAR, an all-in-one solution for performing the above-mentioned task on acts belonging to the Public Administration. First, we create a collection of 8 million documents in 24 languages, tagged with EuroVoc labels, taken from EUR-Lex, the web portal of the European Union legislation. Then, we train different pre-trained BERT-based models, comparing the performance of base models with domain-specific and multilingual ones. We release the corpus, the best-performing models, and a Docker image containing the source code of the trainer, the REST API, and the web interface. This image can be employed out-of-the-box for document classification.

KEVLAR: the Complete Resource for EuroVoc Classification of Legal Documents / Bocchi, Lorenzo; Casula, Camilla; Palmero Aprosio, Alessio. - ELETTRONICO. - 3878:09(2024), pp. 1-8. (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel December 4-6, 2024).

KEVLAR: the Complete Resource for EuroVoc Classification of Legal Documents

Bocchi Lorenzo;Casula Camilla;Palmero Aprosio Alessio
2024-01-01

Abstract

The use of Machine Learning and Artificial Intelligence in the Public Administration (PA) has increased in the last years. In particular, recent guidelines proposed by various governments for the classification of documents released by the PA suggest to use the EuroVoc thesaurus. In this paper, we present KEVLAR, an all-in-one solution for performing the above-mentioned task on acts belonging to the Public Administration. First, we create a collection of 8 million documents in 24 languages, tagged with EuroVoc labels, taken from EUR-Lex, the web portal of the European Union legislation. Then, we train different pre-trained BERT-based models, comparing the performance of base models with domain-specific and multilingual ones. We release the corpus, the best-performing models, and a Docker image containing the source code of the trainer, the REST API, and the web interface. This image can be employed out-of-the-box for document classification.
2024
CEUR Workshop Proceedings
Aachen, Germany
CEUR-WS
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
Bocchi, Lorenzo; Casula, Camilla; Palmero Aprosio, Alessio
KEVLAR: the Complete Resource for EuroVoc Classification of Legal Documents / Bocchi, Lorenzo; Casula, Camilla; Palmero Aprosio, Alessio. - ELETTRONICO. - 3878:09(2024), pp. 1-8. (Intervento presentato al convegno 10th Italian Conference on Computational Linguistics, CLiC-it 2024 tenutosi a Pisa, Italy nel December 4-6, 2024).
File in questo prodotto:
File Dimensione Formato  
9_main_long.pdf

accesso aperto

Descrizione: Paper PDF
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/445252
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact