The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.

HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish / Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub. - (2022), pp. 7140-7146. (Intervento presentato al convegno 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 tenutosi a Marseille, France nel 20-25 June, 2022).

HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Szymanik, Jakub
2022-01-01

Abstract

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.
2022
Proceedings of the 13th International Conference on Language Resources and Evaluation
Marseille
European Language Resources Association (ELRA)
979-10-95546-72-6
Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish / Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub. - (2022), pp. 7140-7146. (Intervento presentato al convegno 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 tenutosi a Marseille, France nel 20-25 June, 2022).
File in questo prodotto:
File Dimensione Formato  
2022.lrec-1.773.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 234 kB
Formato Adobe PDF
234 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369375
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact