We propose Multi-task learning (MTL) for time-continuous or dynamic emotion (valence and arousal) estimation in movie scenes. Since compiling annotated training data for dynamic emotion prediction is tedious, we employ crowdsourcing for the same. Even though the crowdworkers come from various demographics, we demonstrate that MTL can effectively discover (1) consistent patterns in their dynamic emotion perception, and (2) the low-level audio and video features that contribute to their valence, arousal (VA) elicitation. Finally, we show that MTL-based regression models, which simultaneously learn the relationship between low-level audio-visual features and high-level VA ratings from a collection of movie scenes, can predict VA ratings for time-contiguous snippets from each scene more effectively than scene-specific models.

A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations / Khomami Abadi, Mojtaba; Abad, Azad; Subramanian, Ramanathan; Rostamzadeh, Negar; Ricci, E.; Varadarajan, J.; Sebe, Niculae. - (2014), pp. 17-23. ( 3rd International ACM Workshop on Crowdsourcing for Multimedia, CrowdMM 2014 Orlando 5 November 2014) [10.1145/2660114.2660126].

A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations

Khomami Abadi, Mojtaba;Abad, Azad;Subramanian, Ramanathan;Rostamzadeh, Negar;E. Ricci;Sebe, Niculae
2014-01-01

Abstract

We propose Multi-task learning (MTL) for time-continuous or dynamic emotion (valence and arousal) estimation in movie scenes. Since compiling annotated training data for dynamic emotion prediction is tedious, we employ crowdsourcing for the same. Even though the crowdworkers come from various demographics, we demonstrate that MTL can effectively discover (1) consistent patterns in their dynamic emotion perception, and (2) the low-level audio and video features that contribute to their valence, arousal (VA) elicitation. Finally, we show that MTL-based regression models, which simultaneously learn the relationship between low-level audio-visual features and high-level VA ratings from a collection of movie scenes, can predict VA ratings for time-contiguous snippets from each scene more effectively than scene-specific models.
2014
Proceedings of the International ACM Workshop on Crowdsourcing for Multimedia
New York
ACM
9781450331289
Khomami Abadi, Mojtaba; Abad, Azad; Subramanian, Ramanathan; Rostamzadeh, Negar; Ricci, E.; Varadarajan, J.; Sebe, Niculae
A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations / Khomami Abadi, Mojtaba; Abad, Azad; Subramanian, Ramanathan; Rostamzadeh, Negar; Ricci, E.; Varadarajan, J.; Sebe, Niculae. - (2014), pp. 17-23. ( 3rd International ACM Workshop on Crowdsourcing for Multimedia, CrowdMM 2014 Orlando 5 November 2014) [10.1145/2660114.2660126].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/97417
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 8
social impact