The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

IRIS

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation / Batsuren, K., Bella, G., Arora, A., Martinovic, V., Gorman, K., Žabokrtský, Z., Ganbold, A., Dohnalová, Š., Ševčíková, M., Pelegrinová, K., Giunchiglia, F., Cotterell, R., Vylomova, E.. - (2022), pp. 103-116. (19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 Seattle, Washington July 2022) [10.18653/v1/2022.sigmorphon-1.11].

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Batsuren, Khuyagbaatar;Bella, Gabor;Arora, Aryaman;Martinovic, Viktor;Gorman, Kyle;Žabokrtský, Zdeněk;Ganbold, Amarsanaa;Dohnalová, Šárka;Ševčíková, Magda;Pelegrinová, Kateřina;Giunchiglia, Fausto;Cotterell, Ryan;Vylomova, Ekaterina

2022-01-01

Abstract

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
			
	Luogo di edizione (Place of publication)
	
				Seattle, Washington
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics (ACL)
			
	ISBN
	
				9781955917827
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85139172965
			
	Tutti gli autori
	
						Batsuren, Khuyagbaatar; Bella, Gabor; Arora, Aryaman; Martinovic, Viktor; Gorman, Kyle; Žabokrtský, Zdeněk; Ganbold, Amarsanaa; Dohnalová, Šárka; Ševč...espandi
						
	Citazione
	
				The SIGMORPHON 2022 Shared Task on Morpheme Segmentation / Batsuren, K., Bella, G., Arora, A., Martinovic, V., Gorman, K., Žabokrtský, Z., Ganbold, A., Dohnalová, Š., Ševčíková, M., Pelegrinová, K., Giunchiglia, F., Cotterell, R., Vylomova, E.. - (2022), pp. 103-116. (19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 Seattle, Washington July 2022) [10.18653/v1/2022.sigmorphon-1.11].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369578

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

43

ND

26

social impact