The aim of this work is the application of techniques developed in the domain of corpus linguistics to a collection of ancient Greek texts, taking into account not only the canonical text established by modern editors, but also the variant readings recorded in the critical apparatus or in the repertories of conjectures. The dissertation is divided in three connected parts: construction, mapping and analysis of the corpus. The first part is devoted to corpus construction and it is focused on the techniques to improve the OCR accuracy on classical critical editions. This task is challenging because critical editions are multilingual, the set of characters to recognize is wide and the quality of last centuries paper is variable. Three OCR engines are applied to the same texts and a Bayesian classifier, joint to a specific spell-checker, evaluates the most probable output. It is demonstrated that the improvement is significative and, in the best cases, it is more than 3%. The second part is devoted to the alignment of the contents extracted from critical apparatus and repertories of conjectures to the reference text. TOGLIERE A parser has been developed to classify the chunks of information (verse number, Greek word sequences, textual operation, scholar that suggested the conjecture). Alignment algorithms used to find the precise position of the conjecture in its context are illustrated in detail. The third part is devoted to the study of the semantic spaces of ancient Greek. The chapter is focused on the specificity of the corpus, that is morphologically complex, literary (both poetry and prose) and diachronical (from VIII century B.C. to XV century A.D.). The word senses in documents belonging to different genres are explored, and the diachronical change of meaning is observed. Finally, a couple of meaningful conjectures extracted in the first part is analysed, evaluating the most interesting reciprocal relations in the semantic space.

A Corpus-based Approach to Philological Issues / Boschetti, Federico. - (2010), pp. 1-105.

A Corpus-based Approach to Philological Issues

Boschetti, Federico
2010-01-01

Abstract

The aim of this work is the application of techniques developed in the domain of corpus linguistics to a collection of ancient Greek texts, taking into account not only the canonical text established by modern editors, but also the variant readings recorded in the critical apparatus or in the repertories of conjectures. The dissertation is divided in three connected parts: construction, mapping and analysis of the corpus. The first part is devoted to corpus construction and it is focused on the techniques to improve the OCR accuracy on classical critical editions. This task is challenging because critical editions are multilingual, the set of characters to recognize is wide and the quality of last centuries paper is variable. Three OCR engines are applied to the same texts and a Bayesian classifier, joint to a specific spell-checker, evaluates the most probable output. It is demonstrated that the improvement is significative and, in the best cases, it is more than 3%. The second part is devoted to the alignment of the contents extracted from critical apparatus and repertories of conjectures to the reference text. TOGLIERE A parser has been developed to classify the chunks of information (verse number, Greek word sequences, textual operation, scholar that suggested the conjecture). Alignment algorithms used to find the precise position of the conjecture in its context are illustrated in detail. The third part is devoted to the study of the semantic spaces of ancient Greek. The chapter is focused on the specificity of the corpus, that is morphologically complex, literary (both poetry and prose) and diachronical (from VIII century B.C. to XV century A.D.). The word senses in documents belonging to different genres are explored, and the diachronical change of meaning is observed. Finally, a couple of meaningful conjectures extracted in the first part is analysed, evaluating the most interesting reciprocal relations in the semantic space.
2010
XXII
2009-2010
Scienze della Cogn e della Form (cess.4/11/12)
Cognitive and Brain Sciences
Baroni, Marco
no
Inglese
Settore INF/01 - Informatica
Settore L-FIL-LET/05 - Filologia Classica
Settore L-LIN/01 - Glottologia e Linguistica
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.85 MB
Formato Adobe PDF
2.85 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369215
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact