End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

IRIS

This paper describes FBK’s participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems’ ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an endto-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pertaining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multitask learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are finetuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 / Gaido, Marco; Di Gangi, Mattia A.; Negri, Matteo; Turchi, Marco. - (2020), pp. 80-88. (Intervento presentato al convegno IWSLT tenutosi a Online nel July 9 - 10, 2020) [10.18653/v1/2020.iwslt-1.8].

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

Gaido, Marco;Di Gangi, Mattia A.;Negri, Matteo;Turchi, Marco

2020-01-01

Abstract

This paper describes FBK’s participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems’ ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an endto-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pertaining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multitask learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are finetuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 17th International Conference on Spoken Language Translation
			
	Luogo di edizione (Place of publication)
	
				Stroudsburg, PA USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	ISBN
	
				978-1-952148-07-1
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85097899664
			
	Codice WOS (WOS identifier)
	
				WOS:000563427100008
			
	Tutti gli autori
	
						Gaido, Marco; Di Gangi, Mattia A.; Negri, Matteo; Turchi, Marco
					
	Citazione
	
				End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 / Gaido, Marco; Di Gangi, Mattia A.; Negri, Matteo; Turchi, Marco. - (2020), pp. 80-88. (Intervento presentato al  convegno IWSLT tenutosi a Online nel July 9 - 10, 2020) [10.18653/v1/2020.iwslt-1.8].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2020.iwslt-1.8.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 225.73 kB Formato Adobe PDF Visualizza/Apri	225.73 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369989

Citazioni

ND

49

30

ND

social impact