Learning with Noisy Labels (LNL) reduces reliance on high-quality labeled data but often overlooks open-set noise, where noisy samples belong to unknown classes, unlike closed-set noise within known categories.This paper advances LNL by reformulating the problem to incorporate open-set noise through a complete noise transition matrix, enabling a theoretical comparison of its impact on classification error rates against closed-set noise. Our analysis reveals that open-set noise induces smaller error increases, with distinct effects from 'hard' (semantically similar to inliers) and 'easy' (dissimilar) variants. We evaluate entropy-based detection, finding it effective only for easy open-set noise, and propose solutions leveraging vision-language models and self-supervised learning to address hard noise challenges. For empirical validation, we introduce CIFAR100-O, ImageNet-O, and a WebVision open-set test set, enabling robust benchmarking of LNL methods under open-set noise conditions. Recognizing classification accuracy's limitations in capturing model robustness, we advocate out-of-distribution (OOD) detection as a complementary metric. Our theoretical and empirical results highlight the unique challenges of open-set noise, offering new tools and evaluation frameworks to enhance LNL robustness in real-world scenarios.
Unveiling Open-set Noise: Theoretical Insights into Label Noise / Feng, Chen; Sebe, Nicu; Tzimiropoulos, Georgios; Rodrigues, Miguel R. D.; Patras, Ioannis. - (2025), pp. 3290-3299. ( ACM Multimedia Dublin October 2025) [10.1145/3746027.3755040].
Unveiling Open-set Noise: Theoretical Insights into Label Noise
Sebe, Nicu;
2025-01-01
Abstract
Learning with Noisy Labels (LNL) reduces reliance on high-quality labeled data but often overlooks open-set noise, where noisy samples belong to unknown classes, unlike closed-set noise within known categories.This paper advances LNL by reformulating the problem to incorporate open-set noise through a complete noise transition matrix, enabling a theoretical comparison of its impact on classification error rates against closed-set noise. Our analysis reveals that open-set noise induces smaller error increases, with distinct effects from 'hard' (semantically similar to inliers) and 'easy' (dissimilar) variants. We evaluate entropy-based detection, finding it effective only for easy open-set noise, and propose solutions leveraging vision-language models and self-supervised learning to address hard noise challenges. For empirical validation, we introduce CIFAR100-O, ImageNet-O, and a WebVision open-set test set, enabling robust benchmarking of LNL methods under open-set noise conditions. Recognizing classification accuracy's limitations in capturing model robustness, we advocate out-of-distribution (OOD) detection as a complementary metric. Our theoretical and empirical results highlight the unique challenges of open-set noise, offering new tools and evaluation frameworks to enhance LNL robustness in real-world scenarios.| File | Dimensione | Formato | |
|---|---|---|---|
|
3746027.3755040.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.84 MB
Formato
Adobe PDF
|
1.84 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



