Adaptive Quality Estimation for Machine Translation and Automatic Speech Recognition

Camargo De Souza, José Guilherme

Quality estimation (QE) approaches aim to predict the quality of an automatically generated output without relying on manually-crafted references. Having access only to the system's input and output, a QE module assigns a score or a label to each (input, output) pair. In this thesis we develop approaches to predict the quality of outputs of two types of natural language processing (NLP) systems: machine translation (MT) and automatic speech recognition (ASR). The work presented here can be divided into three parts. The first part presents advances on the standard approaches to MT QE. We describe a general feature extraction framework, several quality indicators dependent and independent from MT systems that generate the translations, and new quality indicators that approximate the cross-lingual mapping between the meaning of source and translated sentences. Such advances result in state-of-the-art performance in two official evaluation campaigns on the MT QE problem. In the second part we show that the standard MT QE approaches suffer from domain drift problems due to the high specificity of labeled data currently available. In the standard MT QE framework, models are trained on data from a specific text type, with translations produced by one MT system and with labels obtained over the work of specific individual translators. Such models present poor performance when one or more of these conditions change. The ability of a system to adapt and cope with such changes is a facet of the QE problem that so far has been disregarded. To address these issues and deal with the noisy conditions of real-world translation workflows, we propose adaptive approaches to QE that are robust to both the diverse nature of translation jobs and differences between training and test data. In the third part, we propose and define an ASR QE framework. We identify useful quality indicators and show that ASR QE can be performed without having access to the ASR system, by only exploring information of its inputs and outputs. We apply a subset of the same adaptive techniques developed for MT QE and show that the ASR QE setting can also benefit from robust adaptive learning algorithms.

Adaptive Quality Estimation for Machine Translation and Automatic Speech Recognition / Camargo de Souza, José Guilherme. - (2016), pp. 1-168.

Adaptive Quality Estimation for Machine Translation and Automatic Speech Recognition

Camargo de Souza, José Guilherme

2016-01-01

Abstract

Quality estimation (QE) approaches aim to predict the quality of an automatically generated output without relying on manually-crafted references. Having access only to the system's input and output, a QE module assigns a score or a label to each (input, output) pair. In this thesis we develop approaches to predict the quality of outputs of two types of natural language processing (NLP) systems: machine translation (MT) and automatic speech recognition (ASR). The work presented here can be divided into three parts. The first part presents advances on the standard approaches to MT QE. We describe a general feature extraction framework, several quality indicators dependent and independent from MT systems that generate the translations, and new quality indicators that approximate the cross-lingual mapping between the meaning of source and translated sentences. Such advances result in state-of-the-art performance in two official evaluation campaigns on the MT QE problem. In the second part we show that the standard MT QE approaches suffer from domain drift problems due to the high specificity of labeled data currently available. In the standard MT QE framework, models are trained on data from a specific text type, with translations produced by one MT system and with labels obtained over the work of specific individual translators. Such models present poor performance when one or more of these conditions change. The ability of a system to adapt and cope with such changes is a facet of the QE problem that so far has been disregarded. To address these issues and deal with the noisy conditions of real-world translation workflows, we propose adaptive approaches to QE that are robust to both the diverse nature of translation jobs and differences between training and test data. In the third part, we propose and define an ASR QE framework. We identify useful quality indicators and show that ASR QE can be performed without having access to the ASR system, by only exploring information of its inputs and outputs. We apply a subset of the same adaptive techniques developed for MT QE and show that the ASR QE setting can also benefit from robust adaptive learning algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2016
			
	Ciclo
	
				XXVII
			
	Anno Accademico
	
				2015-2016
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Turchi, Marco
Federico, Marcello
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Negri, Matteo
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_thesis_jsouza_v2.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.95 MB Formato Adobe PDF Visualizza/Apri	6.95 MB	Adobe PDF	Visualizza/Apri