Comparisons of visual activity primitives for voice activity detection