Perception, language, and gesture: towards a natural human - computer interaction

De Angeli, Antonella; Gerbino, W.; Romary, L.; Wolff, F.

Multimodal systems extract and convey meanings through different I/O interfaces, such as voice, writing, gestures, gaze movements, and facial expressions. Increasing the communication bandwidth between humans and computers, current technology has the potential of introducing a major shift in the usability of future systems: the interaction becomes more natural, flexible, and robust. The visual/spatial domain is the ideal ground for multimodal systems. Referring to objects in space is strongly simplified by the synergistic usage of natural language and gestures. Despite the importance of visual cues to resolve ambiguities, traditional multimodal interfaces are blind. References are resolved mainly by considering the dialogue context. To match spontaneous user behaviour, we propose an architecture in which gesture recognition depends on anthropomorphic perceptual features. The ecological validity of our approach is confirmed by results from a 'Wizard of Oz' experiment. Users communicated with a simulated multimodal system and moved groups of objects into appropriate boxes. Speech was mediated by a microphone and gestures by an electronic pen. Visual-field organisation was manipulated according to Gestalt principles. Two conditions were tested: high vs low group salience. Results showed that both gesture trajectories and linguistic behaviour are influenced by perceptual-field organisation.