STaRS.sys: designing and building a commonsense-knowledge enriched wordnet for therapeutic purposes

Lebani, Gianluca E.

doi:10.15168/11572_367678

This thesis investigates the possibility to exploit human language resources and knowledge extraction techniques to build STaRS.sys, a software system designed to support therapists in the rehabilitation of Italian anomic patients. After an introductory section reviewing classification, assessment, and remediation methods for naming disorders, we analyze the current trends in the exploitation of computers for the rehabilitation of language disorders. Starting from an analysis of the needs of speech therapists in their daily work with aphasic patients, the requirements for the STaRS.sys application are defined, and a number of possible uses identified. To be able to implement these functionalities, STaRS.sys needs to be based on a lexical knowledge base encoding, in a explicit and computationally tractable way, at least the kind of semantic knowledge contained in the so called feature norms. As a backbone for the development of this semantic resource we chose to exploit the Italian MultiWordNet lexicon derived from the original Princeton WordNet. We show that the WordNet model is relatively well suited for our needs, but that an extension of its semantic model is nevertheless needed. Starting from the assumption that the kinds composing the feature types classifications exploited for encoding feature norms can be mapped onto semantic relations in a WordNet-like semantic network, we identified a set of 25 semantic relations that can cover all the information contained in these datasets. To demonstrate the feasibility of our proposal, we first asked to a group of therapists to use our feature types classification for classifying a set of 300 features. The analysis of the inter-coder agreement shows that the proposed classification can be used in a reliable way by speech therapists. Subsequently, we collected a new set of Italian feature norms for 50 concrete concepts and analyze the issues raised by the attempt to encode them into a version of MultiWordNet extended to include the new set of relations. This analysis shows that, in addition to extending the relation set, a number of further modifications are needed, for instance to be able to encode negation, quantifications or the strength of a relation. Information that, we will show, isn't well represented in the existing feature norms either. After defining an extended version of MultiWordNet (sMWN), suitable to encode the information contained in feature norms, we deal with the issue of automatic extraction of such semantic information from corpora. We applied to an Italian a corpus state of the art machine-learning-based method for the extraction of common-sense conceptual knowledge from corpora, previously applied to English. We tried a number of modifications and extensions of the original algorithm, with the aim of improving its accuracy. Results and limitations are presented and analyzed, and possible future improvement discussed.

STaRS.sys: designing and building a commonsense-knowledge enriched wordnet for therapeutic purposes / Lebani, Gianluca E.. - (2012), pp. 1-173. [10.15168/11572_367678]