Effective methods for evaluating the reliability of statements issued by witnesses and defendants in hearings would be extremely valuable to decision-making in Court and other legal settings. In recent years, methods relying on stylometric techniques have proven most successful for this task; but few such methods have been tested with language collected in real-life situations of high-stakes deception, and therefore their usefulness outside laboratory conditions still has to be properly assessed. DeCour - DEception in COURt corpus - has been built with the aim of training models suitable to discriminate, from a stylometric point of view, between sincere and deceptive statements. DeCour is a collection of hearings held in four Italian Courts, in which the speakers lie in front of the judge. These hearings become the object of a specific criminal proceeding for calumny or false testimony, in which the deceptiveness of the statements of the defendant is ascertained. Thanks to the final Court judgment, that points out which lies are told, each utterance of the corpus has been annotated as true, uncertain or false, according to its degree of truthfulness. Since the judgment of deceptiveness follows a judicial inquiry, the annotation has been realized with a greater degree of confidence than ever before. In Italy this is the first corpus of deceptive texts not relying on ‘mock’ lies created in laboratory conditions, but which has been collected in a natural environment. In this dissertation we replicated the methods used in previous studies but never before applied to high-stakes data, and tested new methods. Among the best known proposals in this direction are methods proposed by Pennebaker and colleagues, who employed their lexicon - the Linguistic Inquiry and Word Count (liwc) - to analyze different texts or transcriptions of spoken language, in which deception could have been used, but collected in an artificial way. In our experiments, we trained machine learning models relying both on lexical features belonging to liwc and on surface features. The surface features were selected calculating their Information Gain, or simply according to the frequency they appear in the texts. We also considered the effect of a number of variables including the degree of certainty the utterances were annotated as truthful or not and the homogeneity of the dataset. In particular, the classification task of false utterances was carried out against the only utterances annotated as true, or against the utterances annotated as true and as uncertain together. Moreover subsets of DeCour were analysed, in which the statements were issued by homogeneous categories of subject, e.g. speakers of the same gender, age or native language. Our results suggest that accuracy at deception detection clearly above chance level can be obtained with real-life data as well.

Deception Detection in Italian Court testimonies / Fornaciari, Tommaso. - (2012), pp. 1-105.

Deception Detection in Italian Court testimonies

Fornaciari, Tommaso
2012-01-01

Abstract

Effective methods for evaluating the reliability of statements issued by witnesses and defendants in hearings would be extremely valuable to decision-making in Court and other legal settings. In recent years, methods relying on stylometric techniques have proven most successful for this task; but few such methods have been tested with language collected in real-life situations of high-stakes deception, and therefore their usefulness outside laboratory conditions still has to be properly assessed. DeCour - DEception in COURt corpus - has been built with the aim of training models suitable to discriminate, from a stylometric point of view, between sincere and deceptive statements. DeCour is a collection of hearings held in four Italian Courts, in which the speakers lie in front of the judge. These hearings become the object of a specific criminal proceeding for calumny or false testimony, in which the deceptiveness of the statements of the defendant is ascertained. Thanks to the final Court judgment, that points out which lies are told, each utterance of the corpus has been annotated as true, uncertain or false, according to its degree of truthfulness. Since the judgment of deceptiveness follows a judicial inquiry, the annotation has been realized with a greater degree of confidence than ever before. In Italy this is the first corpus of deceptive texts not relying on ‘mock’ lies created in laboratory conditions, but which has been collected in a natural environment. In this dissertation we replicated the methods used in previous studies but never before applied to high-stakes data, and tested new methods. Among the best known proposals in this direction are methods proposed by Pennebaker and colleagues, who employed their lexicon - the Linguistic Inquiry and Word Count (liwc) - to analyze different texts or transcriptions of spoken language, in which deception could have been used, but collected in an artificial way. In our experiments, we trained machine learning models relying both on lexical features belonging to liwc and on surface features. The surface features were selected calculating their Information Gain, or simply according to the frequency they appear in the texts. We also considered the effect of a number of variables including the degree of certainty the utterances were annotated as truthful or not and the homogeneity of the dataset. In particular, the classification task of false utterances was carried out against the only utterances annotated as true, or against the utterances annotated as true and as uncertain together. Moreover subsets of DeCour were analysed, in which the statements were issued by homogeneous categories of subject, e.g. speakers of the same gender, age or native language. Our results suggest that accuracy at deception detection clearly above chance level can be obtained with real-life data as well.
2012
XXV
2011-2012
CIMEC (29/10/12-)
Cognitive and Brain Sciences
Poesio, Massimo
no
Inglese
Settore L-LIN/01 - Glottologia e Linguistica
Settore M-PSI/01 - Psicologia Generale
File in questo prodotto:
File Dimensione Formato  
tfthesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369179
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact