Spoken language translation (SLT) exists within one of the most challenging intersections of speech and natural language processing. While machine translation (MT) has demonstrated its effectiveness on the translation of textual data, the translation of spoken language remains a challenge, largely due to the mismatch between the training conditions of MT and the noisy signal that is output by an automatic speech recognition (ASR) system. In the interchange between ASR and MT, errors propagated from noisy speech recognition outputs may become compounded, rendering the speech translation to be unintelligible. Additionally, aspects such as stylistic differences between written and spoken registers can lead to the generation of inadequate translations. This scenario is predominantly caused by a mismatch between the training conditions of ASR and MT. Due to the lack of training data that couples speech audio with translated transcripts, MT systems in the SLT pipeline must rely predominantly on textual data that does not represent well the characteristics of spoken language. Likewise, independence assumptions between each sentence results in ASR and MT systems that do not yield consistent outputs. In this thesis develop techniques to overcome the mismatch between speech and textual data by improving the robustness of the MT system. Our work can be divided into three parts. First we analyze the effects the difference between spoken and written registers has on SLT quality. We additionally introduce a data analysis methodology to measure the impact of ASR errors on translation quality. Secondly, we propose several approaches to improve the MT component's tolerance of noisy ASR outputs: by adapting its models based on the bilingual statistics of each sentence's neighboring context, and through the introduction of a process by which textual resources can be transformed into synthetic ASR data to use when training a speech-centric MT system. In particular, we focus on the translation from spoken English to French and German -- the two parent languages of English -- and demonstrate that information about the types and frequency of ASR errors can improve the robustness of machine translation for SLT. Finally, we introduce and motivate several challenges in spoken language translation with neural machine translation models that are specific to their modeling architecture.

Speech Adaptation Modeling for Statistical Machine Translation / Ruiz, Nicholas. - (2017), pp. 1-175.

Speech Adaptation Modeling for Statistical Machine Translation

Ruiz, Nicholas
2017-01-01

Abstract

Spoken language translation (SLT) exists within one of the most challenging intersections of speech and natural language processing. While machine translation (MT) has demonstrated its effectiveness on the translation of textual data, the translation of spoken language remains a challenge, largely due to the mismatch between the training conditions of MT and the noisy signal that is output by an automatic speech recognition (ASR) system. In the interchange between ASR and MT, errors propagated from noisy speech recognition outputs may become compounded, rendering the speech translation to be unintelligible. Additionally, aspects such as stylistic differences between written and spoken registers can lead to the generation of inadequate translations. This scenario is predominantly caused by a mismatch between the training conditions of ASR and MT. Due to the lack of training data that couples speech audio with translated transcripts, MT systems in the SLT pipeline must rely predominantly on textual data that does not represent well the characteristics of spoken language. Likewise, independence assumptions between each sentence results in ASR and MT systems that do not yield consistent outputs. In this thesis develop techniques to overcome the mismatch between speech and textual data by improving the robustness of the MT system. Our work can be divided into three parts. First we analyze the effects the difference between spoken and written registers has on SLT quality. We additionally introduce a data analysis methodology to measure the impact of ASR errors on translation quality. Secondly, we propose several approaches to improve the MT component's tolerance of noisy ASR outputs: by adapting its models based on the bilingual statistics of each sentence's neighboring context, and through the introduction of a process by which textual resources can be transformed into synthetic ASR data to use when training a speech-centric MT system. In particular, we focus on the translation from spoken English to French and German -- the two parent languages of English -- and demonstrate that information about the types and frequency of ASR errors can improve the robustness of machine translation for SLT. Finally, we introduce and motivate several challenges in spoken language translation with neural machine translation models that are specific to their modeling architecture.
2017
XXVII
2017-2018
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Federico, Marcello
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
DECLARATORIA_ENG.pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 757.98 kB
Formato Adobe PDF
757.98 kB Adobe PDF   Visualizza/Apri
speech-adaptation-modeling_(6).pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369115
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact