This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions / Uryupina, O; Artstein, R; Bristot, A; Cavicchio, F; Rodriguez, Kj; Poesio, M. - STAMPA. - (2016), pp. 2058-2062. (Intervento presentato al convegno LREC tenutosi a Portorozh nel 23-28 May 2016).
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Uryupina, O;Bristot, A;Cavicchio, F;Rodriguez, KJ;Poesio, M
2016-01-01
Abstract
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione