End-to-End Discourse Parsing with Cascaded Structured Prediction

Ghosh, Sucheta

Parsing discourse is a challenging natural language processing task. In this research work first we take a data driven approach to identify arguments of explicit discourse connectives. In contrast to previous work we do not make any assumptions on the span of arguments and consider parsing as a token-level sequence labeling task. We design the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs). We train the CRFs on lexical, syntactic and semantic features extracted from the Penn Discourse Treebank and evaluate feature combinations on the commonly used test split. We show that the best combination of features includes syntactic and semantic features. The comparative error analysis investigates the performance variability over connective types and argument positions. We also compare the results of cascaded pipeline with a non-cascaded structured prediction setting that shows us definitely the cascaded structured prediction is a better performing method for discourse parsing. We present a novel end-to-end discourse parser that, given a plain text document in input, identifies the discourse relations in the text, assigns them a semantic label and detects discourse arguments spans. The parsing architecture is based on a cascade of decisions supported by Conditional Random Fields (CRF). We train and evaluate three different parsers using the PDTB corpus. The three system versions are compared to evaluate their robustness with respect to deep/shallow and automatically extracted syntactic features. Next, we describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points. Finally, we address the problem of optimization in discourse parsing. A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential or sentence-level(global), data-driven, non-grammatical features in the task of parsing discourse. The parser model follows up previous approach based on using token-level (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge of discourse. The parser adopts a two-stage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space ($n$-best). In the latter stage we experiment with different rerankers trained on the first stage $n$-best parses, which are generated using lexico-syntactic local features. The two-stage parser yields significant improvements over the best performing model of discourse parser on the PDTB corpus.

End-to-End Discourse Parsing with Cascaded Structured Prediction / Ghosh, Sucheta. - (2012), pp. 1-129.

End-to-End Discourse Parsing with Cascaded Structured Prediction

Ghosh, Sucheta

2012-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2012
			
	Ciclo
	
				XXIV
			
	Anno Accademico
	
				2011-2012
			
	Dipartimento
	
				Ingegneria e Scienza dell'Informaz (cess.4/11/12)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Riccardi, Giuseppe
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
sghosh_thesis_v1.04.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri