This paper investigates to which extent state of the art machine learning methods are effective in classifying emotions in the context of individual musical instruments, and how their performances compare with musically trained and untrained listeners. To address these questions we created a novel dataset of 391 classical and acoustic guitar excerpts annotated along four emotions (aggressiveness, relaxation, happiness and sadness) with three emotion intensity levels (low, medium, high), according to the intended emotion of 30 professional guitarists acting as both composers and performers. A first experiment investigated listeners' perception involving 8 professional guitarists and 8 non-musicians. Results showed that the emotions intended by a composer-performer are not always well recognized by listeners, and in general not with the same intensity. Listeners' identification accuracy was proportional to the intensity with which an emotion was expressed. Emotions were better recognized by musicians rather than listeners with no musical background. Such differences between the two groups were found for different intensities levels of the intended emotions. A second experiment investigated machine listening performance based on transfer learning methods. To compare machine and human identification accuracies fairly, we derived a fifth, ‘`ambivalent’' category from the machine listening output categories (i.e., excerpts rated with more than one predominant emotion). Results showed that the machine perception of emotions matched or even exceeded musicians' performance for all emotions except ‘`relaxation’'. The differences between the intended and human-perceived emotions, as well as those due to musical training, suggest that the device or application involving a music emotion recognition system should take into account the characteristics of the users (in particular their musical expertise) as well as their roles (e.g., composers, performers, listeners). For developers this translates into the use of datasets annotated by different categories of annotators, whose role and musical expertise will match the characteristics of the end users. Such results are particularly relevant to the creation of emotionally-aware smart musical instruments.
Music emotion recognition: intention of composers-performers versus perception of musicians, non-musicians, and listening machines / Turchet, L.; Pauwels, J.. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - 30:(2021), pp. 305-316. [10.1109/TASLP.2021.3138709]
Music emotion recognition: intention of composers-performers versus perception of musicians, non-musicians, and listening machines
Turchet L.;
2021-01-01
Abstract
This paper investigates to which extent state of the art machine learning methods are effective in classifying emotions in the context of individual musical instruments, and how their performances compare with musically trained and untrained listeners. To address these questions we created a novel dataset of 391 classical and acoustic guitar excerpts annotated along four emotions (aggressiveness, relaxation, happiness and sadness) with three emotion intensity levels (low, medium, high), according to the intended emotion of 30 professional guitarists acting as both composers and performers. A first experiment investigated listeners' perception involving 8 professional guitarists and 8 non-musicians. Results showed that the emotions intended by a composer-performer are not always well recognized by listeners, and in general not with the same intensity. Listeners' identification accuracy was proportional to the intensity with which an emotion was expressed. Emotions were better recognized by musicians rather than listeners with no musical background. Such differences between the two groups were found for different intensities levels of the intended emotions. A second experiment investigated machine listening performance based on transfer learning methods. To compare machine and human identification accuracies fairly, we derived a fifth, ‘`ambivalent’' category from the machine listening output categories (i.e., excerpts rated with more than one predominant emotion). Results showed that the machine perception of emotions matched or even exceeded musicians' performance for all emotions except ‘`relaxation’'. The differences between the intended and human-perceived emotions, as well as those due to musical training, suggest that the device or application involving a music emotion recognition system should take into account the characteristics of the users (in particular their musical expertise) as well as their roles (e.g., composers, performers, listeners). For developers this translates into the use of datasets annotated by different categories of annotators, whose role and musical expertise will match the characteristics of the end users. Such results are particularly relevant to the creation of emotionally-aware smart musical instruments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione