Metadata are fundamental for the indexing, browsing, and retrieval of cultural heritage resources in digital repositories. Since the manual control of metadata quality in digital repositories may not be feasible, especially when working with large collections, this Ph.D. thesis focuses specifically on the problem of automatic metadata quality assessment. Taking as the main reference the Metadata Quality Framework developed by Thomas Bruce and Diane Hilmann, we propose to evaluate metadata information according to three aspects. The first is metadata Completeness, approached as a statistical analysis. We compute the ratio of the filled elements with respect to the metadata schema taking into account its structure as well as the specific topic of a collection. The second is metadata Accuracy of the textual description of a given cultural heritage object, approached as a binary classification problem. We determine whether the field contains a high-quality or low-quality description, measured as the compliance of the textual content with the description rules from the guidelines used to implement metadata information. The last aspect concerns metadata Coherence, where we investigate the feasibility to use high-quality metadata at source while implementing metadata information. We assess the metadata Coherence of the subject element recommending the three most likely subjects of the resource analyzing the iconography of the resource. Applying this methodology to the Italian digital library ``Cultura Italia'', we noticed overall that it is indeed possible to automatically evaluate metadata quality. However, despite the promising results we obtained, to have a more detailed picture about automatic metadata quality evaluation, our methods should be also tested on a wider range of digital repositories.
Metadata Quality Evaluation in Cultural Heritage Domain / Lorenzini, Matteo. - (2022 Feb 15), pp. 1-129. [10.15168/11572_330448]
Metadata Quality Evaluation in Cultural Heritage Domain
Lorenzini, Matteo
2022-02-15
Abstract
Metadata are fundamental for the indexing, browsing, and retrieval of cultural heritage resources in digital repositories. Since the manual control of metadata quality in digital repositories may not be feasible, especially when working with large collections, this Ph.D. thesis focuses specifically on the problem of automatic metadata quality assessment. Taking as the main reference the Metadata Quality Framework developed by Thomas Bruce and Diane Hilmann, we propose to evaluate metadata information according to three aspects. The first is metadata Completeness, approached as a statistical analysis. We compute the ratio of the filled elements with respect to the metadata schema taking into account its structure as well as the specific topic of a collection. The second is metadata Accuracy of the textual description of a given cultural heritage object, approached as a binary classification problem. We determine whether the field contains a high-quality or low-quality description, measured as the compliance of the textual content with the description rules from the guidelines used to implement metadata information. The last aspect concerns metadata Coherence, where we investigate the feasibility to use high-quality metadata at source while implementing metadata information. We assess the metadata Coherence of the subject element recommending the three most likely subjects of the resource analyzing the iconography of the resource. Applying this methodology to the Italian digital library ``Cultura Italia'', we noticed overall that it is indeed possible to automatically evaluate metadata quality. However, despite the promising results we obtained, to have a more detailed picture about automatic metadata quality evaluation, our methods should be also tested on a wider range of digital repositories.File | Dimensione | Formato | |
---|---|---|---|
tesi_iris.pdf
accesso aperto
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Creative commons
Dimensione
6.35 MB
Formato
Adobe PDF
|
6.35 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione