This thesis investigates the possibility to exploit human language resources and knowledge extraction techniques to build STaRS.sys, a software system designed to support therapists in the rehabilitation of Italian anomic patients. After an introductory section reviewing classification, assessment, and remediation methods for naming disorders, we analyze the current trends in the exploitation of computers for the rehabilitation of language disorders. Starting from an analysis of the needs of speech therapists in their daily work with aphasic patients, the requirements for the STaRS.sys application are defined, and a number of possible uses identified. To be able to implement these functionalities, STaRS.sys needs to be based on a lexical knowledge base encoding, in a explicit and computationally tractable way, at least the kind of semantic knowledge contained in the so called feature norms. As a backbone for the development of this semantic resource we chose to exploit the Italian MultiWordNet lexicon derived from the original Princeton WordNet. We show that the WordNet model is relatively well suited for our needs, but that an extension of its semantic model is nevertheless needed. Starting from the assumption that the kinds composing the feature types classifications exploited for encoding feature norms can be mapped onto semantic relations in a WordNet-like semantic network, we identified a set of 25 semantic relations that can cover all the information contained in these datasets. To demonstrate the feasibility of our proposal, we first asked to a group of therapists to use our feature types classification for classifying a set of 300 features. The analysis of the inter-coder agreement shows that the proposed classification can be used in a reliable way by speech therapists. Subsequently, we collected a new set of Italian feature norms for 50 concrete concepts and analyze the issues raised by the attempt to encode them into a version of MultiWordNet extended to include the new set of relations. This analysis shows that, in addition to extending the relation set, a number of further modifications are needed, for instance to be able to encode negation, quantifications or the strength of a relation. Information that, we will show, isn't well represented in the existing feature norms either. After defining an extended version of MultiWordNet (sMWN), suitable to encode the information contained in feature norms, we deal with the issue of automatic extraction of such semantic information from corpora. We applied to an Italian a corpus state of the art machine-learning-based method for the extraction of common-sense conceptual knowledge from corpora, previously applied to English. We tried a number of modifications and extensions of the original algorithm, with the aim of improving its accuracy. Results and limitations are presented and analyzed, and possible future improvement discussed.

STaRS.sys: designing and building a commonsense-knowledge enriched wordnet for therapeutic purposes / Lebani, Gianluca E.. - (2012), pp. 1-173.

STaRS.sys: designing and building a commonsense-knowledge enriched wordnet for therapeutic purposes

Lebani, Gianluca E.
2012-01-01

Abstract

This thesis investigates the possibility to exploit human language resources and knowledge extraction techniques to build STaRS.sys, a software system designed to support therapists in the rehabilitation of Italian anomic patients. After an introductory section reviewing classification, assessment, and remediation methods for naming disorders, we analyze the current trends in the exploitation of computers for the rehabilitation of language disorders. Starting from an analysis of the needs of speech therapists in their daily work with aphasic patients, the requirements for the STaRS.sys application are defined, and a number of possible uses identified. To be able to implement these functionalities, STaRS.sys needs to be based on a lexical knowledge base encoding, in a explicit and computationally tractable way, at least the kind of semantic knowledge contained in the so called feature norms. As a backbone for the development of this semantic resource we chose to exploit the Italian MultiWordNet lexicon derived from the original Princeton WordNet. We show that the WordNet model is relatively well suited for our needs, but that an extension of its semantic model is nevertheless needed. Starting from the assumption that the kinds composing the feature types classifications exploited for encoding feature norms can be mapped onto semantic relations in a WordNet-like semantic network, we identified a set of 25 semantic relations that can cover all the information contained in these datasets. To demonstrate the feasibility of our proposal, we first asked to a group of therapists to use our feature types classification for classifying a set of 300 features. The analysis of the inter-coder agreement shows that the proposed classification can be used in a reliable way by speech therapists. Subsequently, we collected a new set of Italian feature norms for 50 concrete concepts and analyze the issues raised by the attempt to encode them into a version of MultiWordNet extended to include the new set of relations. This analysis shows that, in addition to extending the relation set, a number of further modifications are needed, for instance to be able to encode negation, quantifications or the strength of a relation. Information that, we will show, isn't well represented in the existing feature norms either. After defining an extended version of MultiWordNet (sMWN), suitable to encode the information contained in feature norms, we deal with the issue of automatic extraction of such semantic information from corpora. We applied to an Italian a corpus state of the art machine-learning-based method for the extraction of common-sense conceptual knowledge from corpora, previously applied to English. We tried a number of modifications and extensions of the original algorithm, with the aim of improving its accuracy. Results and limitations are presented and analyzed, and possible future improvement discussed.
2012
XXIV
2011-2012
Scienze della Cogn e della Form (cess.4/11/12)
Cognitive and Brain Sciences
Pianta, Emanuele
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
lebani_thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.83 MB
Formato Adobe PDF
3.83 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/367678
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact