Understanding small group face-to-face interactions is a prominent research problem for social psychology while the automatic realization of it recently became popular in social computing. This is mainly investigated in terms of nonverbal behaviors, as they are one of the main facet of communication. Among several multi-modal nonverbal cues, visual activity is an important one and its sufficiently good performance can be crucial for instance, when the audio sensors are missing. The existing visual activity-based nonverbal features, which are all hand-crafted, were able to perform well enough for some applications while did not perform well for some other problems. Given these observations, we claim that there is a need of more robust feature representations, which can be learned from data itself. To realize this, we propose a novel method, which is composed of optical flow computation, deep neural network-based feature learning, feature encoding and classification. Additionally, a comprehensive analysis between different feature encoding techniques is also presented. The proposed method is tested on three research topics, which can be perceived during small group interactions i.e. meetings: i) emergent leader detection, ii) emergent leadership style prediction, and iii) high/low extraversion classification. The proposed method shows (significantly) better results not only as compared to the state of the art visual activity based-nonverbal features but also when the state of the art visual activity based-nonverbal features are combined with other audio-based and video-based nonverbal features.
Investigation of small group social interactions using deep visual activity-based nonverbal features / Beyan, C.; Shahid, M.; Murino, V.. - (2018), pp. 311-319. (Intervento presentato al convegno 26th ACM Multimedia conference, MM 2018 tenutosi a Seoul nel 2018) [10.1145/3240508.3240685].
Investigation of small group social interactions using deep visual activity-based nonverbal features
Beyan C.;
2018-01-01
Abstract
Understanding small group face-to-face interactions is a prominent research problem for social psychology while the automatic realization of it recently became popular in social computing. This is mainly investigated in terms of nonverbal behaviors, as they are one of the main facet of communication. Among several multi-modal nonverbal cues, visual activity is an important one and its sufficiently good performance can be crucial for instance, when the audio sensors are missing. The existing visual activity-based nonverbal features, which are all hand-crafted, were able to perform well enough for some applications while did not perform well for some other problems. Given these observations, we claim that there is a need of more robust feature representations, which can be learned from data itself. To realize this, we propose a novel method, which is composed of optical flow computation, deep neural network-based feature learning, feature encoding and classification. Additionally, a comprehensive analysis between different feature encoding techniques is also presented. The proposed method is tested on three research topics, which can be perceived during small group interactions i.e. meetings: i) emergent leader detection, ii) emergent leadership style prediction, and iii) high/low extraversion classification. The proposed method shows (significantly) better results not only as compared to the state of the art visual activity based-nonverbal features but also when the state of the art visual activity based-nonverbal features are combined with other audio-based and video-based nonverbal features.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione