The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish / Woliński, Marcin; Nitoń, Bartosz; Kieraś, Witold; Szymanik, Jakub. - (2022), pp. 7140-7146. (Intervento presentato al convegno 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 tenutosi a Marseille, France nel 20-25 June, 2022).
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish
Szymanik, Jakub
2022-01-01
Abstract
The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.File | Dimensione | Formato | |
---|---|---|---|
2022.lrec-1.773.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
234 kB
Formato
Adobe PDF
|
234 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione