This dissertation investigates emotional prosody across different languages and gen- ders, with a focus on its role in the development of Automatic Speech Emotion Recogni- tion (ASER) systems. Recognizing the limitations of existing emotional speech datasets, such as their narrow scope, limited number of speakers, and languages, this work intro- duces the Actors Challenge (AC) dataset. The AC is a dynamic, evolving, web-based interactive game designed to generate a rich speech dataset, serving as a valuable resource for studying affective prosody and other speech-related research topics. Participants not only produce emotional expressions but also evaluate the emotional performances of oth- ers, resulting in a dataset enriched with human annotations. Using one of the most advanced speech models available, the dissertation explores cross-linguistic aspects of emotional prosody, examining whether ASER systems can generalize across languages and detect cross-linguistic acoustic markers of emotion. Additionally, it investigates how gender differences play a role in expression and recognition of emotion in speech. The computational approach is complemented by a series of acoustic feature analyses, offer- ing a dual perspective that highlights the challenges and complexities of training ASER systems that accurately recognize emotions across diverse linguistic and gender contexts. The findings emphasize the complexity of emotional speech and the crucial role that di- verse, high-quality datasets play in achieving effective cross-linguistic emotion recognition in speech.
Emotional Prosody across languages and genders / Sepanta, Sia Vosh. - (2024 Nov 25).
Emotional Prosody across languages and genders
Sepanta, Sia Vosh
2024-11-25
Abstract
This dissertation investigates emotional prosody across different languages and gen- ders, with a focus on its role in the development of Automatic Speech Emotion Recogni- tion (ASER) systems. Recognizing the limitations of existing emotional speech datasets, such as their narrow scope, limited number of speakers, and languages, this work intro- duces the Actors Challenge (AC) dataset. The AC is a dynamic, evolving, web-based interactive game designed to generate a rich speech dataset, serving as a valuable resource for studying affective prosody and other speech-related research topics. Participants not only produce emotional expressions but also evaluate the emotional performances of oth- ers, resulting in a dataset enriched with human annotations. Using one of the most advanced speech models available, the dissertation explores cross-linguistic aspects of emotional prosody, examining whether ASER systems can generalize across languages and detect cross-linguistic acoustic markers of emotion. Additionally, it investigates how gender differences play a role in expression and recognition of emotion in speech. The computational approach is complemented by a series of acoustic feature analyses, offer- ing a dual perspective that highlights the challenges and complexities of training ASER systems that accurately recognize emotions across diverse linguistic and gender contexts. The findings emphasize the complexity of emotional speech and the crucial role that di- verse, high-quality datasets play in achieving effective cross-linguistic emotion recognition in speech.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione