It is a well known fact that high quality Automatic Speech Recognition is still difficult to guarantee under conditions in which the speaker is distant from the microphone due to the distortions caused by acoustic phenomena, such as noise and reverberation. Among the different research directions pursued around this problem, the adoption of multi-channel approaches is of great interest to the community given the potential of taking advantage of information diversity. In this thesis we elaborate on approaches that exploit different instances of a sound source, captured by various largely spaced microphones, in order to extract a Distant Speech Recognition hypothesis. Two original solutions are presented, based on information fusion approaches at different levels of the recognition system, one at front-end stage and one at post-decoding stage, namely for the problems of channel selection (CS) and hypothesis combination. First, a new CS framework is proposed. Cepstral distance (CD), which is effectively applied in other acoustic processing fields, is the basis of the CS method developed. Experimental results confirmed the advantages of a CD-based selection schema under different scenarios. The second contribution concerns the combination of information extracted from the individual decoding processes performed over the multiple captured signals. It is shown how temporal cues can be identified in the hypothesis space, and be beneficial for the elaboration of a multi-microphone confusion network, from which the final speech transcription is derived. The proposed methods are applicable in a setting equipped with synchronized distributed microphones, independently of the proximity between the sensors. Analysis of the novel concepts were performed over synthetic and real-captured data. Both approaches achieved positive results at the different assessment tasks they were exposed to.

Information Fusion Approaches for Distant Speech Recognition in a Multi-microphone Setting / Guerrero Flores, Cristina Maritza. - (2016), pp. 1-148.

Information Fusion Approaches for Distant Speech Recognition in a Multi-microphone Setting

Guerrero Flores, Cristina Maritza
2016-01-01

Abstract

It is a well known fact that high quality Automatic Speech Recognition is still difficult to guarantee under conditions in which the speaker is distant from the microphone due to the distortions caused by acoustic phenomena, such as noise and reverberation. Among the different research directions pursued around this problem, the adoption of multi-channel approaches is of great interest to the community given the potential of taking advantage of information diversity. In this thesis we elaborate on approaches that exploit different instances of a sound source, captured by various largely spaced microphones, in order to extract a Distant Speech Recognition hypothesis. Two original solutions are presented, based on information fusion approaches at different levels of the recognition system, one at front-end stage and one at post-decoding stage, namely for the problems of channel selection (CS) and hypothesis combination. First, a new CS framework is proposed. Cepstral distance (CD), which is effectively applied in other acoustic processing fields, is the basis of the CS method developed. Experimental results confirmed the advantages of a CD-based selection schema under different scenarios. The second contribution concerns the combination of information extracted from the individual decoding processes performed over the multiple captured signals. It is shown how temporal cues can be identified in the hypothesis space, and be beneficial for the elaboration of a multi-microphone confusion network, from which the final speech transcription is derived. The proposed methods are applicable in a setting equipped with synchronized distributed microphones, independently of the proximity between the sensors. Analysis of the novel concepts were performed over synthetic and real-captured data. Both approaches achieved positive results at the different assessment tasks they were exposed to.
2016
XXVI
2015-2016
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Omologo, Maurizio
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
160830_cguerrero_phd-thesis.pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.02 MB
Formato Adobe PDF
4.02 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368955
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact