Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations - usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning / Ermolov, Aleksandr; Mirvakhabova, Leyla; Khrulkov, Valentin; Sebe, Nicu; Oseledets, Ivan. - 2022-:(2022), pp. 7399-7409. (Intervento presentato al convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a New Orleans nel 2022) [10.1109/CVPR52688.2022.00726].

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Ermolov, Aleksandr;Sebe, Nicu;
2022-01-01

Abstract

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations - usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.
2022
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Piscataway, NJ USA
IEEE
978-1-6654-6946-3
Ermolov, Aleksandr; Mirvakhabova, Leyla; Khrulkov, Valentin; Sebe, Nicu; Oseledets, Ivan
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning / Ermolov, Aleksandr; Mirvakhabova, Leyla; Khrulkov, Valentin; Sebe, Nicu; Oseledets, Ivan. - 2022-:(2022), pp. 7399-7409. (Intervento presentato al convegno 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 tenutosi a New Orleans nel 2022) [10.1109/CVPR52688.2022.00726].
File in questo prodotto:
File Dimensione Formato  
Ermolov_Hyperbolic_Vision_Transformers_Combining_Improvements_in_Metric_Learning_CVPR_2022_paper.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.77 MB
Formato Adobe PDF
2.77 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/361297
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 43
  • ???jsp.display-item.citation.isi??? 26
social impact