Acting on a visual word: The role of perception in multimodal HCI