Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method able to perform a dynamic compression of the input in direct ST models. In particular, we exploit the Connectionist Temporal Classification (CTC) to compress the input sequence according to its phonetic characteristics. Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement over a strong baseline on two language pairs (English-Italian and English-German), contextually reducing the memory footprint by more than 10%.

CTC-based compression for direct speech translation / Gaido, M.; Cettolo, M.; Negri, M.; Turchi, M.. - (2021), pp. 690-696. (Intervento presentato al convegno 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 tenutosi a Online nel 2021).

CTC-based compression for direct speech translation

Gaido M.;
2021-01-01

Abstract

Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method able to perform a dynamic compression of the input in direct ST models. In particular, we exploit the Connectionist Temporal Classification (CTC) to compress the input sequence according to its phonetic characteristics. Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement over a strong baseline on two language pairs (English-Italian and English-German), contextually reducing the memory footprint by more than 10%.
2021
EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
Online
Association for Computational Linguistics (ACL)
Gaido, M.; Cettolo, M.; Negri, M.; Turchi, M.
CTC-based compression for direct speech translation / Gaido, M.; Cettolo, M.; Negri, M.; Turchi, M.. - (2021), pp. 690-696. (Intervento presentato al convegno 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 tenutosi a Online nel 2021).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/330133
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 11
social impact