Deep Learning for Distant Speech Recognition

Ravanelli, Mirco

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called “network of deep neural networks”. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.

Deep Learning for Distant Speech Recognition / Ravanelli, Mirco. - (2017), pp. 1-230.

Deep Learning for Distant Speech Recognition

Ravanelli, Mirco

2017-01-01

Abstract

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called “network of deep neural networks”. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2017
			
	Ciclo
	
				XXIX
			
	Anno Accademico
	
				2017-2018
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Omologo, Maurizio
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
Disclaimer_Ravanelli.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 849.54 kB Formato Adobe PDF Visualizza/Apri	849.54 kB	Adobe PDF	Visualizza/Apri
PhD_thesis_Ravanelli_Mirco_archived.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 5.41 MB Formato Adobe PDF Visualizza/Apri	5.41 MB	Adobe PDF	Visualizza/Apri