We address the issue of domain adaptation for automatic Personality Recognition from Text (PRT). The PRT task consists in the classification of the personality traits of some authors, given some pieces of text they wrote. The purpose of our work is to improve current approaches to PRT in order to extract personality information from social network sites, which is a really challenging task. We argue that current approaches, based on supervised learning, have several limitations for the adaptation to social network domain, mainly due to 1) difficulties in data annotation, 2) overfitting, 3) lack of domain adaptability and 4) multilinguality issues. We propose and test a new approach to PRT, that we will call Adaptive Personality Recognition (APR). We argue that this new approach solves domain adaptability problems and it is suitable for the application in Social Network Sites. We start from an introduction that covers all the background knowledge required for understanding PRT. It includes arguments like personality, the the Big5 factor model, the sets of correlations between language features and personality traits and a brief survey on learning approaches, that includes also feature selection and domain adaptation. We also provide an overview of the state-of-theart in PRT and we outline the problems we see in the application of PRT to social network domain. Basically, our APR approach is based on 1) an external model: a set of features/correlations between language and Big5 personality traits (taken from literature); 2) an adaptive strategy, that makes the model fit the distribution of the features in the dataset at hand, before generating personality hypotheses; 3) an evaluation strategy, that compares all the hypotheses generated for each single text of each author, computing confidence scores. This allows domain adaptation, semi-supervised learning and the automatic extraction of patterns associated to personality traits, that can be added to the initial correlation set, thus combining top-down and bottom-up approaches. The main contributions of our approach to the research in the field of PRT are: 1) the possibility to run top-down PRT from models taken from literature, adapting them to new datasets; 2) the definition of a small, language-independent and resource-free feature/ correlation set, tested on Italian and English; 3) the possibility to integrate top-down and bottom-up PRT strategies, allowing the enrichment of the initial feature/correlation from the dataset at hand; 4) the development of a system for APR, that does not require large labeled datasets for training, but just a small one for testing, minimizing the data annotation problem. Finally, we describe some applications of APR to the analysis of personality in online social network sites, reporting results and findings. We argue that the APR approach is very useful for Social Network Analysis, social marketing, opinion mining, sentiment analysis, mood detection and related fields.

Adaptive Personality Recogntion from Text / Celli, Fabio. - (2012), pp. 1-120.

Adaptive Personality Recogntion from Text

Celli, Fabio
2012-01-01

Abstract

We address the issue of domain adaptation for automatic Personality Recognition from Text (PRT). The PRT task consists in the classification of the personality traits of some authors, given some pieces of text they wrote. The purpose of our work is to improve current approaches to PRT in order to extract personality information from social network sites, which is a really challenging task. We argue that current approaches, based on supervised learning, have several limitations for the adaptation to social network domain, mainly due to 1) difficulties in data annotation, 2) overfitting, 3) lack of domain adaptability and 4) multilinguality issues. We propose and test a new approach to PRT, that we will call Adaptive Personality Recognition (APR). We argue that this new approach solves domain adaptability problems and it is suitable for the application in Social Network Sites. We start from an introduction that covers all the background knowledge required for understanding PRT. It includes arguments like personality, the the Big5 factor model, the sets of correlations between language features and personality traits and a brief survey on learning approaches, that includes also feature selection and domain adaptation. We also provide an overview of the state-of-theart in PRT and we outline the problems we see in the application of PRT to social network domain. Basically, our APR approach is based on 1) an external model: a set of features/correlations between language and Big5 personality traits (taken from literature); 2) an adaptive strategy, that makes the model fit the distribution of the features in the dataset at hand, before generating personality hypotheses; 3) an evaluation strategy, that compares all the hypotheses generated for each single text of each author, computing confidence scores. This allows domain adaptation, semi-supervised learning and the automatic extraction of patterns associated to personality traits, that can be added to the initial correlation set, thus combining top-down and bottom-up approaches. The main contributions of our approach to the research in the field of PRT are: 1) the possibility to run top-down PRT from models taken from literature, adapting them to new datasets; 2) the definition of a small, language-independent and resource-free feature/ correlation set, tested on Italian and English; 3) the possibility to integrate top-down and bottom-up PRT strategies, allowing the enrichment of the initial feature/correlation from the dataset at hand; 4) the development of a system for APR, that does not require large labeled datasets for training, but just a small one for testing, minimizing the data annotation problem. Finally, we describe some applications of APR to the analysis of personality in online social network sites, reporting results and findings. We argue that the APR approach is very useful for Social Network Analysis, social marketing, opinion mining, sentiment analysis, mood detection and related fields.
2012
XXIV
2011-2012
Scienze della Cogn e della Form (cess.4/11/12)
Cognitive and Brain Sciences
Poesio, Massimo
no
Inglese
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Settore M-PSI/05 - Psicologia Sociale
File in questo prodotto:
File Dimensione Formato  
celli_phdthesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369178
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact