Integration and synchronization of input modes during multimodal human-computer interaction

Oviatt, S.; De Angeli, Antonella; Kuhn, K.

doi:10.1145/258549.258821

Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people’s combined use of different input modes. To provide a foundadon for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate tiequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal pecedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.