In this thesis we investigate methods for deploying machine translation (MT) in real-world application scenarios related to the use of MT in computer assisted translation (CAT), where human translators post-edit MT outputs. In particular, we investigate (in chronological order) MT adaptation under two working conditions: single-domain and multi-domain. In the former, we assume that MT receives requests by a single user working on a single domain, while in the latter we assume the MT system to receive requests i) from multiple users working on different domains, ii) with no predefined order, and iii) without domain information. In the single-domain case, we first focus on word alignment, a core component of online adaptive phrase-based MT (PBMT) that is crucial for extracting features from a post-edited segment. In particular, we concentrate on improving word alignment in presence of out-of-vocabulary words observed in the source sentences or introduced by the post-editor. In the multi-domain scenario we turned our focus to the neural MT (NMT) paradigm. In particular, we introduce a scalable solution that adapts on-the-fly a generic NMT model to each incoming translation request. It relies on a procedure that locally fine-tunes the model to each input sentence using samples retrieved from a pool of parallel data. Our instance-based adaptation uses a more general formulation of the log-likelihood approach to control the contribution of relevant and irrelevant words during model update. Finally, we test our approach on a simulated continuous learning setting, where the system receives user feedback under form of post-editing.
Online Adaptive Neural Machine Translation: from single- to multi-domain scenarios / Farajian, Mohammad Amin. - (2018), pp. 1-122.
Online Adaptive Neural Machine Translation: from single- to multi-domain scenarios
Farajian, Mohammad Amin
2018-01-01
Abstract
In this thesis we investigate methods for deploying machine translation (MT) in real-world application scenarios related to the use of MT in computer assisted translation (CAT), where human translators post-edit MT outputs. In particular, we investigate (in chronological order) MT adaptation under two working conditions: single-domain and multi-domain. In the former, we assume that MT receives requests by a single user working on a single domain, while in the latter we assume the MT system to receive requests i) from multiple users working on different domains, ii) with no predefined order, and iii) without domain information. In the single-domain case, we first focus on word alignment, a core component of online adaptive phrase-based MT (PBMT) that is crucial for extracting features from a post-edited segment. In particular, we concentrate on improving word alignment in presence of out-of-vocabulary words observed in the source sentences or introduced by the post-editor. In the multi-domain scenario we turned our focus to the neural MT (NMT) paradigm. In particular, we introduce a scalable solution that adapts on-the-fly a generic NMT model to each incoming translation request. It relies on a procedure that locally fine-tunes the model to each input sentence using samples retrieved from a pool of parallel data. Our instance-based adaptation uses a more general formulation of the log-likelihood approach to control the contribution of relevant and irrelevant words during model update. Finally, we test our approach on a simulated continuous learning setting, where the system receives user feedback under form of post-editing.File | Dimensione | Formato | |
---|---|---|---|
PhD_Thesis_Amin.pdf
Solo gestori archivio
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.12 MB
Formato
Adobe PDF
|
1.12 MB | Adobe PDF | Visualizza/Apri |
DOC260418-26042018182822.pdf
embargo fino al {0}
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
83.9 kB
Formato
Adobe PDF
|
83.9 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione