The present paper addresses the study of cross-linguistic and cross-modal iconicity within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a concrete word, encoded as a sequence of feature vectors, to the visual representation of its referent, expressed as an HCNN-transformed image. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a randomized baseline; we show that such an imaginative network is capable of extracting language-independent generalizations in the mapping from linguistic sounds to visual features, providing empirical support for the hypothesis of a universal sound-symbolic substrate underlying all languages.
Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World? / de Varda, A. G.; Strapparava, C.. - In: IJCAI. - ISSN 1045-0823. - (2021), pp. 643-649. (Intervento presentato al convegno 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 tenutosi a can nel 2021).
Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?
Strapparava C.
2021-01-01
Abstract
The present paper addresses the study of cross-linguistic and cross-modal iconicity within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a concrete word, encoded as a sequence of feature vectors, to the visual representation of its referent, expressed as an HCNN-transformed image. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a randomized baseline; we show that such an imaginative network is capable of extracting language-independent generalizations in the mapping from linguistic sounds to visual features, providing empirical support for the hypothesis of a universal sound-symbolic substrate underlying all languages.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione