The present paper deals with methods employed to create a lemmatized edition of all the witnesses of Roman de Horn, King Horn and Horn Childe and Maiden Rimnild. These are three closely related versions of the medieval story of Horn: while the first is in Insular French, the second and the third are in Middle English. The lemmatized edition of these three versions is part of a doctoral project for a qualitative and quantitative analysis of their style. The final goal of the lemmatized edition is to provide a searchable database for stylistic investigations. First, an introduction to the three versions is offered. Then, the reasons for a digital approach are explained. After that, the paper discusses the problems of the lemmatization, and the solutions adopted. The resulting protocol is explained, combining the need for a qualitative and controlled lemmatization with the practical need for a partially automatized pipeline. A program was written employing the Natural Language Toolkit module of Python to edit and lemmatize each witness. A private MySQL database was developed, containing all the lemmas from the Middle English Dictionary and the Anglo-Norman Dictionary: the database hastened the process of lemmatization. Additionally, a set of corpora of Insular French texts and Middle English texts was realized, and these corpora provided a control group for the stylistic analysis. The creation of this control group required a distinct, automatized protocol of lemmatization, which resulted in a specific kind of digital output. The paper ends with a brief example of stylistic investigation employing the corpus of lemmatized texts.
Sull'utilità e i problemi di un'edizione lemmatizzata. Un caso esempio offerto da Roman de Horn, King Horn e Horn Childe and Maiden Rimnild / Gottardi, Pierandrea. - In: FILOLOGIA GERMANICA. - ISSN 2036-8992. - STAMPA. - 14:(2022), pp. 141-169.
Sull'utilità e i problemi di un'edizione lemmatizzata. Un caso esempio offerto da Roman de Horn, King Horn e Horn Childe and Maiden Rimnild
Pierandrea Gottardi
Primo
2022-01-01
Abstract
The present paper deals with methods employed to create a lemmatized edition of all the witnesses of Roman de Horn, King Horn and Horn Childe and Maiden Rimnild. These are three closely related versions of the medieval story of Horn: while the first is in Insular French, the second and the third are in Middle English. The lemmatized edition of these three versions is part of a doctoral project for a qualitative and quantitative analysis of their style. The final goal of the lemmatized edition is to provide a searchable database for stylistic investigations. First, an introduction to the three versions is offered. Then, the reasons for a digital approach are explained. After that, the paper discusses the problems of the lemmatization, and the solutions adopted. The resulting protocol is explained, combining the need for a qualitative and controlled lemmatization with the practical need for a partially automatized pipeline. A program was written employing the Natural Language Toolkit module of Python to edit and lemmatize each witness. A private MySQL database was developed, containing all the lemmas from the Middle English Dictionary and the Anglo-Norman Dictionary: the database hastened the process of lemmatization. Additionally, a set of corpora of Insular French texts and Middle English texts was realized, and these corpora provided a control group for the stylistic analysis. The creation of this control group required a distinct, automatized protocol of lemmatization, which resulted in a specific kind of digital output. The paper ends with a brief example of stylistic investigation employing the corpus of lemmatized texts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione