The present paper addresses the study of cross-linguistic phonosemantic correspondences within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a word, encoded as a sequence of feature vectors, to its corresponding semantic representation in a multilingual and cross-family vector space. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a monolingual and mono-family upper bound and a randomized baseline. After the assessment of the network's performance, the distribution of phonosemantic properties in the lexicon is inspected in relation to different (psycho)linguistic variables, showing a link between lexical non-arbitrariness and semantic, syntactic, pragmatic, and developmental factors.
A Layered Bridge from Sound to Meaning: Investigating Cross-linguistic Phonosemantic Correspondences / de Varda, Andrea; Strapparava, Carlo. - 43:(2021), pp. 1029-1035. (Intervento presentato al convegno 43rd annual meeting of the of the Cognitive Science Society (CogSci 2021) tenutosi a Vienna, Austria nel 26 – 29 July 2021).
A Layered Bridge from Sound to Meaning: Investigating Cross-linguistic Phonosemantic Correspondences
Carlo Strapparava
2021-01-01
Abstract
The present paper addresses the study of cross-linguistic phonosemantic correspondences within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a word, encoded as a sequence of feature vectors, to its corresponding semantic representation in a multilingual and cross-family vector space. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a monolingual and mono-family upper bound and a randomized baseline. After the assessment of the network's performance, the distribution of phonosemantic properties in the lexicon is inspected in relation to different (psycho)linguistic variables, showing a link between lexical non-arbitrariness and semantic, syntactic, pragmatic, and developmental factors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione