In this paper we present a novel treebank developed to analyse marked constructions in Italian called MarkIT. The resource contains almost 1,300 sentences manually annotated with dependency relations following the Universal Dependencies paradigm. The sentences have been extracted from essays written by high-school students along several years, which accounts for the structure and the topic variability of the sentences. In this work, we detail the process to select the sentences, parse them automatically and then manually correct them. The resource covers seven types of marked constructions (839 sentences overall) plus some sentences, whose syntax can be wrongly classified as marked and which can serve as negative examples of markedness (453 sentences). We also present an evaluation of parsing performance, comparing a model trained on existing Italian treebanks with the model obtained by adding MarkIT to the training set.
Adding a Novel Italian Treebank of Marked Constructions to Universal Dependencies / Paccosi, Teresa; Palmero Aprosio, Alessio; Tonelli, Sara. - In: IJCOL. - ISSN 2499-4553. - 9:1(2023). [10.4000/ijcol.1110]
Adding a Novel Italian Treebank of Marked Constructions to Universal Dependencies
Paccosi, Teresa;Palmero Aprosio, Alessio;Tonelli, Sara
2023-01-01
Abstract
In this paper we present a novel treebank developed to analyse marked constructions in Italian called MarkIT. The resource contains almost 1,300 sentences manually annotated with dependency relations following the Universal Dependencies paradigm. The sentences have been extracted from essays written by high-school students along several years, which accounts for the structure and the topic variability of the sentences. In this work, we detail the process to select the sentences, parse them automatically and then manually correct them. The resource covers seven types of marked constructions (839 sentences overall) plus some sentences, whose syntax can be wrongly classified as marked and which can serve as negative examples of markedness (453 sentences). We also present an evaluation of parsing performance, comparing a model trained on existing Italian treebanks with the model obtained by adding MarkIT to the training set.File | Dimensione | Formato | |
---|---|---|---|
ijcol-1110.pdf
accesso aperto
Descrizione: The text only may be used under licence CC BY-NC-ND 4.0. All other elements (illustrations, imported files) are “All rights reserved”, unless otherwise stated.
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.08 MB
Formato
Adobe PDF
|
1.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione