S-VVAD: Visual Voice Activity Detection by Motion Segmentation

IRIS

We address the challenging Voice Activity Detection (VAD) problem, which determines “Who is Speaking and When?” in audiovisual recordings. The typical audiobased VAD systems can be ineffective in the presence of ambient noise or noise variations. Moreover, due to technical or privacy reasons, audio might not be always available. In such cases, the use of video modality to perform VAD is desirable. Almost all existing visual VAD methods rely on body part detection, e.g., face, lips, or hands. In contrast, we propose a novel visual VAD method operating directly on the entire video frame, without the explicit need of detecting a person or his/her body parts. Our method, named S-VVAD, learns body motion cues associated with speech activity within a weakly supervised segmentation framework. Therefore, it not only detects the speakers/not-speakers but simultaneously localizes the image positions of them. It is an end-to-end pipeline, personindependent and it does not require any prior knowledge nor pre-processing. S-VVAD performs well in various challenging conditions and demonstrates the state-of-the-art results on multiple datasets. Moreover, the better generalization capability of S-VVAD is confirmed for cross-dataset and person-independent scenarios.

S-VVAD: Visual Voice Activity Detection by Motion Segmentation / Shahid, Muhammad; Beyan, Cigdem; Murino, Vittorio. - (2021), pp. 2331-2340. ( WACV Online 5-9 January 2021) [10.1109/WACV48630.2021.00238].

S-VVAD: Visual Voice Activity Detection by Motion Segmentation

Shahid, Muhammad;Beyan, Cigdem;Murino, Vittorio

2021-01-01

Abstract

We address the challenging Voice Activity Detection (VAD) problem, which determines “Who is Speaking and When?” in audiovisual recordings. The typical audiobased VAD systems can be ineffective in the presence of ambient noise or noise variations. Moreover, due to technical or privacy reasons, audio might not be always available. In such cases, the use of video modality to perform VAD is desirable. Almost all existing visual VAD methods rely on body part detection, e.g., face, lips, or hands. In contrast, we propose a novel visual VAD method operating directly on the entire video frame, without the explicit need of detecting a person or his/her body parts. Our method, named S-VVAD, learns body motion cues associated with speech activity within a weakly supervised segmentation framework. Therefore, it not only detects the speakers/not-speakers but simultaneously localizes the image positions of them. It is an end-to-end pipeline, personindependent and it does not require any prior knowledge nor pre-processing. S-VVAD performs well in various challenging conditions and demonstrates the state-of-the-art results on multiple datasets. Moreover, the better generalization capability of S-VVAD is confirmed for cross-dataset and person-independent scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2021
			
	Titolo del volume (Proceedings title)
	
				IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2021
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				IEEE
			
	ISBN
	
				978-1-6654-0477-8
978-1-6654-4640-2
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85110840017
			
	Codice WOS (WOS identifier)
	
				WOS:000693397600034
			
	Tutti gli autori
	
						Shahid, Muhammad; Beyan, Cigdem; Murino, Vittorio
					
	Citazione
	
				S-VVAD: Visual Voice Activity Detection by Motion Segmentation / Shahid, Muhammad; Beyan, Cigdem; Murino, Vittorio. - (2021), pp. 2331-2340. ( WACV Online 5-9 January 2021) [10.1109/WACV48630.2021.00238].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
S-VVAD_Visual_Voice_Activity_Detection_by_Motion_Segmentation.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.64 MB Formato Adobe PDF Visualizza/Apri	6.64 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/304317

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

26

22

ND

social impact