A Semantics-driven Methodology for High-quality Image Annotation

Diao, Xiaolei

doi:10.15168/11572_449959

In recent years, the field of computer vision has achieved significant advancements, largely driven by high-quality datasets. However, current models struggle to provide the responses that humans expect, especially when addressing complex computer vision tasks. One primary reason for this limitation lies in the systemic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this study, we propose an integrated methodology for constructing image datasets, referred to as vTelos, which integrates natural language processing, knowledge representation, and computer vision techniques. The primary objective of vTelos is to make explicit the intended annotation semantics, thus minimizing the number and role of subjective choices. The methodology introduces two key roles, the Classificationist and the Classifier. The Classificationist ensures semantic consistency within the dataset by defining a classification hierarchy and visual properties. The Classifier, on the other hand, performs image annotation tasks based on the hierarchy outlined by the Classificationist and iterative refinement processes. For large-scale datasets, the methodology incorporates crowdsourcing techniques to enhance annotation efficiency while maintaining high quality. To validate the feasibility and of the proposed methodology, we constructed an image dataset named vTelos-img, under the guidance of vTelos. This dataset was subjected to a multidimensional evaluation framework, including quality assessment, annotation efficiency analysis, model performance comparison, and ablation studies of core components. The resultscomprehensively demonstrate the scientific rigor and practical utility of the vTelos methodology. Experimental findings reveal that vTelos significantly improves dataset quality, optimizes annotation workflows, and enhances the performance of machine learning models. This study provides a robust foundation for future data-driven computer vision tasks and offers a novel perspective on high-quality dataset construction and annotation methodologies.

A Semantics-driven Methodology for High-quality Image Annotation / Diao, Xiaolei. - (2025 Apr 02), pp. 1-124. [10.15168/11572_449959]

A Semantics-driven Methodology for High-quality Image Annotation

Diao, Xiaolei

2025-04-02

Abstract

In recent years, the field of computer vision has achieved significant advancements, largely driven by high-quality datasets. However, current models struggle to provide the responses that humans expect, especially when addressing complex computer vision tasks. One primary reason for this limitation lies in the systemic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this study, we propose an integrated methodology for constructing image datasets, referred to as vTelos, which integrates natural language processing, knowledge representation, and computer vision techniques. The primary objective of vTelos is to make explicit the intended annotation semantics, thus minimizing the number and role of subjective choices. The methodology introduces two key roles, the Classificationist and the Classifier. The Classificationist ensures semantic consistency within the dataset by defining a classification hierarchy and visual properties. The Classifier, on the other hand, performs image annotation tasks based on the hierarchy outlined by the Classificationist and iterative refinement processes. For large-scale datasets, the methodology incorporates crowdsourcing techniques to enhance annotation efficiency while maintaining high quality. To validate the feasibility and of the proposed methodology, we constructed an image dataset named vTelos-img, under the guidance of vTelos. This dataset was subjected to a multidimensional evaluation framework, including quality assessment, annotation efficiency analysis, model performance comparison, and ablation studies of core components. The resultscomprehensively demonstrate the scientific rigor and practical utility of the vTelos methodology. Experimental findings reveal that vTelos significantly improves dataset quality, optimizes annotation workflows, and enhances the performance of machine learning models. This study provides a robust foundation for future data-driven computer vision tasks and offers a novel perspective on high-quality dataset construction and annotation methodologies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2-apr-2025
			
	Ciclo
	
				XXXVI
			
	Anno Accademico
	
				2023-2024
			
	Dipartimento
	
				Ingegneria e Scienza dell'Informaz (cess.4/11/12)
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Giunchiglia, Fausto
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_449959
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_Diao_Xiaolei.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 18.3 MB Formato Adobe PDF Visualizza/Apri	18.3 MB	Adobe PDF	Visualizza/Apri