HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.

HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish / Woliński, M., Nitoń, B., Kieraś, W., Szymanik, J.. - (2022), pp. 7140-7146. (13th International Conference on Language Resources and Evaluation Conference, LREC 2022 Marseille, France 20-25 June, 2022).

HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Woliński, Marcin;Nitoń, Bartosz;Kieraś, Witold;Szymanik, Jakub

2022-01-01

Abstract

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 13th International Conference on Language Resources and Evaluation
			
	Luogo di edizione (Place of publication)
	
				Marseille
			
	Casa editrice (Publisher)
	
				European Language Resources Association (ELRA)
			
	ISBN
	
				979-10-95546-72-6
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85144381398
			
	Codice WOS (WOS identifier)
	
				WOS:000889371707031
			
	Tutti gli autori
	
						Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub
					
	Citazione
	
				HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish / Woliński, M., Nitoń, B., Kieraś, W., Szymanik, J.. - (2022), pp. 7140-7146. (13th International Conference on Language Resources and Evaluation Conference, LREC 2022 Marseille, France 20-25 June, 2022).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2022.lrec-1.773.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 234 kB Formato Adobe PDF Visualizza/Apri	234 kB	Adobe PDF	Visualizza/Apri