Face-to-face communication relies on the seamless integration of multisensory signals, including voice, gaze, and head movements, to convey meaning effectively. This poses a fundamental computational challenge: optimally binding signals sharing the same communicative intention (e.g. looking at the addressee while speaking) and segregating unrelated signals (e.g. looking away while coughing), all within the rapid turn-taking dynamics of conversation. Critically, the computational mechanisms underlying this extraordinary feat remain largely unknown. Here, we cast face-to-face communication as a Bayesian Causal Inference problem to formally test whether prior expectations arbitrate between the integration and segregation of vocal and bodily signals. Moreover, we asked whether there is a stronger prior tendency to integrate audiovisual signals that show the same communicative intention, thus carrying a crossmodal pragmatic correspondence. In a spatial localization task, participants watched audiovisual clips of a speaker where the audio (voice) and the video (bodily cues) were sampled either from congruent positions or at increasing spatial disparities. Crucially, we manipulated the pragmatic correspondence of the signals: in a communicative condition, the speaker addressed the participant with their head, gaze and speech; in a non-communicative condition, the speaker kept the head down and produced a meaningless vocalization. We measured audiovisual integration through the ventriloquist effect, which quantifies how much the perceived audio position is misplaced towards the video position. Bayesian Causal Inference outperformed competing models in explaining participants’ behaviour, demonstrating that prior expectations guide multisensory integration during face-to-face communication. Remarkably, participants showed a stronger prior tendency to integrate vocal and bodily information when signals conveyed congruent communicative intent, suggesting that pragmatic correspondences enhance multisensory integration. Collectively, our findings provide novel and compelling evidence that face-to-face communication is shaped by deeply ingrained expectations about how multisensory signals should be structured and interpreted.

Prior expectations guide multisensory integration during face-to-face communication / Mazzi, Giulia; Ferrari, Ambra; Mencaroni, Maria Laura; Valzolgher, Chiara; Tommasini, Mirko; Pavani, Francesco; Benetti, Stefania. - ELETTRONICO. - 2025:(2025). [10.1101/2025.02.19.638980]

Prior expectations guide multisensory integration during face-to-face communication

Giulia Mazzi
;
Ambra Ferrari;Maria Laura Mencaroni;Chiara Valzolgher;Mirko Tommasini;Francesco Pavani;Stefania Benetti
2025-01-01

Abstract

Face-to-face communication relies on the seamless integration of multisensory signals, including voice, gaze, and head movements, to convey meaning effectively. This poses a fundamental computational challenge: optimally binding signals sharing the same communicative intention (e.g. looking at the addressee while speaking) and segregating unrelated signals (e.g. looking away while coughing), all within the rapid turn-taking dynamics of conversation. Critically, the computational mechanisms underlying this extraordinary feat remain largely unknown. Here, we cast face-to-face communication as a Bayesian Causal Inference problem to formally test whether prior expectations arbitrate between the integration and segregation of vocal and bodily signals. Moreover, we asked whether there is a stronger prior tendency to integrate audiovisual signals that show the same communicative intention, thus carrying a crossmodal pragmatic correspondence. In a spatial localization task, participants watched audiovisual clips of a speaker where the audio (voice) and the video (bodily cues) were sampled either from congruent positions or at increasing spatial disparities. Crucially, we manipulated the pragmatic correspondence of the signals: in a communicative condition, the speaker addressed the participant with their head, gaze and speech; in a non-communicative condition, the speaker kept the head down and produced a meaningless vocalization. We measured audiovisual integration through the ventriloquist effect, which quantifies how much the perceived audio position is misplaced towards the video position. Bayesian Causal Inference outperformed competing models in explaining participants’ behaviour, demonstrating that prior expectations guide multisensory integration during face-to-face communication. Remarkably, participants showed a stronger prior tendency to integrate vocal and bodily information when signals conveyed congruent communicative intent, suggesting that pragmatic correspondences enhance multisensory integration. Collectively, our findings provide novel and compelling evidence that face-to-face communication is shaped by deeply ingrained expectations about how multisensory signals should be structured and interpreted.
2025
bioRxiv
bioRxiv
Prior expectations guide multisensory integration during face-to-face communication / Mazzi, Giulia; Ferrari, Ambra; Mencaroni, Maria Laura; Valzolgher, Chiara; Tommasini, Mirko; Pavani, Francesco; Benetti, Stefania. - ELETTRONICO. - 2025:(2025). [10.1101/2025.02.19.638980]
Mazzi, Giulia; Ferrari, Ambra; Mencaroni, Maria Laura; Valzolgher, Chiara; Tommasini, Mirko; Pavani, Francesco; Benetti, Stefania
File in questo prodotto:
File Dimensione Formato  
Mazzi&Benetti_2025_bioRxiv_Prior_expectations_2025.02.19.638980v1.full.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Creative commons
Dimensione 964.56 kB
Formato Adobe PDF
964.56 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/448452
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact