The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation / Batsuren, Khuyagbaatar; Bella, Gabor; Arora, Aryaman; Martinovic, Viktor; Gorman, Kyle; Žabokrtský, Zdeněk; Ganbold, Amarsanaa; Dohnalová, Šárka; Ševčíková, Magda; Pelegrinová, Kateřina; Giunchiglia, Fausto; Cotterell, Ryan; Vylomova, Ekaterina. - (2022), pp. 103-116. ( 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 Seattle, Washington July 2022) [10.18653/v1/2022.sigmorphon-1.11].

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Batsuren, Khuyagbaatar;Bella, Gabor;Giunchiglia, Fausto;
2022-01-01

Abstract

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
2022
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Seattle, Washington
Association for Computational Linguistics (ACL)
9781955917827
Batsuren, Khuyagbaatar; Bella, Gabor; Arora, Aryaman; Martinovic, Viktor; Gorman, Kyle; Žabokrtský, Zdeněk; Ganbold, Amarsanaa; Dohnalová, Šárka; Ševč...espandi
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation / Batsuren, Khuyagbaatar; Bella, Gabor; Arora, Aryaman; Martinovic, Viktor; Gorman, Kyle; Žabokrtský, Zdeněk; Ganbold, Amarsanaa; Dohnalová, Šárka; Ševčíková, Magda; Pelegrinová, Kateřina; Giunchiglia, Fausto; Cotterell, Ryan; Vylomova, Ekaterina. - (2022), pp. 103-116. ( 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 Seattle, Washington July 2022) [10.18653/v1/2022.sigmorphon-1.11].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369578
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 34
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 25
social impact