In recent years, the field of computer vision has achieved significant advancements, largely driven by high-quality datasets. However, current models struggle to provide the responses that humans expect, especially when addressing complex computer vision tasks. One primary reason for this limitation lies in the systemic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this study, we propose an integrated methodology for constructing image datasets, referred to as vTelos, which integrates natural language processing, knowledge representation, and computer vision techniques. The primary objective of vTelos is to make explicit the intended annotation semantics, thus minimizing the number and role of subjective choices. The methodology introduces two key roles, the Classificationist and the Classifier. The Classificationist ensures semantic consistency within the dataset by defining a classification hierarchy and visual properties. The Classifier, on the other hand, performs image annotation tasks based on the hierarchy outlined by the Classificationist and iterative refinement processes. For large-scale datasets, the methodology incorporates crowdsourcing techniques to enhance annotation efficiency while maintaining high quality. To validate the feasibility and of the proposed methodology, we constructed an image dataset named vTelos-img, under the guidance of vTelos. This dataset was subjected to a multidimensional evaluation framework, including quality assessment, annotation efficiency analysis, model performance comparison, and ablation studies of core components. The resultscomprehensively demonstrate the scientific rigor and practical utility of the vTelos methodology. Experimental findings reveal that vTelos significantly improves dataset quality, optimizes annotation workflows, and enhances the performance of machine learning models. This study provides a robust foundation for future data-driven computer vision tasks and offers a novel perspective on high-quality dataset construction and annotation methodologies.
A Semantics-driven Methodology for High-quality Image Annotation / Diao, Xiaolei. - (2025 Apr 02), pp. 1-124.
A Semantics-driven Methodology for High-quality Image Annotation
Diao, Xiaolei
2025-04-02
Abstract
In recent years, the field of computer vision has achieved significant advancements, largely driven by high-quality datasets. However, current models struggle to provide the responses that humans expect, especially when addressing complex computer vision tasks. One primary reason for this limitation lies in the systemic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this study, we propose an integrated methodology for constructing image datasets, referred to as vTelos, which integrates natural language processing, knowledge representation, and computer vision techniques. The primary objective of vTelos is to make explicit the intended annotation semantics, thus minimizing the number and role of subjective choices. The methodology introduces two key roles, the Classificationist and the Classifier. The Classificationist ensures semantic consistency within the dataset by defining a classification hierarchy and visual properties. The Classifier, on the other hand, performs image annotation tasks based on the hierarchy outlined by the Classificationist and iterative refinement processes. For large-scale datasets, the methodology incorporates crowdsourcing techniques to enhance annotation efficiency while maintaining high quality. To validate the feasibility and of the proposed methodology, we constructed an image dataset named vTelos-img, under the guidance of vTelos. This dataset was subjected to a multidimensional evaluation framework, including quality assessment, annotation efficiency analysis, model performance comparison, and ablation studies of core components. The resultscomprehensively demonstrate the scientific rigor and practical utility of the vTelos methodology. Experimental findings reveal that vTelos significantly improves dataset quality, optimizes annotation workflows, and enhances the performance of machine learning models. This study provides a robust foundation for future data-driven computer vision tasks and offers a novel perspective on high-quality dataset construction and annotation methodologies.File | Dimensione | Formato | |
---|---|---|---|
phd_unitn_Diao_Xiaolei.pdf
accesso aperto
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
18.3 MB
Formato
Adobe PDF
|
18.3 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione