This work is devoted to self-supervised representation learning (SSL). We consider both contrastive and non-contrastive methods and present a new loss function for SSL based on feature whitening. Our solution is conceptually simple and competitive with other methods. Self-supervised representations are beneficial for most areas of deep learning, and reinforcement learning is of particular interest because SSL can compensate for the sparsity of the training signal. We present two methods from this area. The first tackles the partial observability providing the agent with a history, represented with temporal alignment, and improves performance in most Atari environments. The second addresses the exploration problem. The method employs a world model of the SSL latent space, and the prediction error of this model indicates novel states required to explore. It shows strong performance on exploration-hard benchmarks, especially on the notorious Montezuma's Revenge. Finally, we consider the metric learning problem, which has much in common with SSL approaches. We present a new method based on hyperbolic embeddings, vision transformers and contrastive loss. We demonstrate the advantage of hyperbolic space over the widely used Euclidean space for metric learning. The method outperforms the current state-of-the-art by a significant margin.

Self-supervised Representation Learning in Computer Vision and Reinforcement Learning / Ermolov, Aleksandr. - (2022 Dec 06), pp. 1-160. [10.15168/11572_360781]

Self-supervised Representation Learning in Computer Vision and Reinforcement Learning

Ermolov, Aleksandr
2022-12-06

Abstract

This work is devoted to self-supervised representation learning (SSL). We consider both contrastive and non-contrastive methods and present a new loss function for SSL based on feature whitening. Our solution is conceptually simple and competitive with other methods. Self-supervised representations are beneficial for most areas of deep learning, and reinforcement learning is of particular interest because SSL can compensate for the sparsity of the training signal. We present two methods from this area. The first tackles the partial observability providing the agent with a history, represented with temporal alignment, and improves performance in most Atari environments. The second addresses the exploration problem. The method employs a world model of the SSL latent space, and the prediction error of this model indicates novel states required to explore. It shows strong performance on exploration-hard benchmarks, especially on the notorious Montezuma's Revenge. Finally, we consider the metric learning problem, which has much in common with SSL approaches. We present a new method based on hyperbolic embeddings, vision transformers and contrastive loss. We demonstrate the advantage of hyperbolic space over the widely used Euclidean space for metric learning. The method outperforms the current state-of-the-art by a significant margin.
6-dic-2022
XXXIV
2021-2022
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Sebe, Niculae
no
Inglese
File in questo prodotto:
File Dimensione Formato  
phd_unitn_ermolov.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Creative commons
Dimensione 16.71 MB
Formato Adobe PDF
16.71 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/360781
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact