From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Xing, Songlong; Peruzzo, Elia; Sangineto, Enver; Sebe, Nicu

doi:10.1007/978-3-031-78189-6_25

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to D) is a hallmark of human intelligence, as evidenced by sufficient findings in cognitive science for the last decades. In recent years, this property has been found far beyond cognitive science. Notable examples are word2vec and GloVe models in natural language processing. Recent research in computer vision also found the property of analogies in the feature space of a pretrained ConvNet feature extractor. However, analogy mining in the semantic space of recent strong foundation models such as CLIP is still understudied, despite the fact that they have been successfully applied to a wide range of downstream tasks. In this work, we show that CLIP possesses the similar ability of analogical reasoning in the latent space, and propose a novel strategy to extract analogies between pairs of images in the CLIP space. We compute all the difference vectors of a pair of any two images that belong to the same class in the CLIP space, and employ k-means clustering to group the difference vectors into clusters irrespective of their classes. This procedure results in cluster centroids representative of class-agnostic semantic analogies between images. Through extensive analysis, we show that the property of drawing analogies between images also exists in the CLIP space, which are interpretable by humans through a visualisation of the learned clusters.

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space / Xing, Songlong; Peruzzo, Elia; Sangineto, Enver; Sebe, Nicu. - 15309 LNCS:(2024), pp. 383-399. (Intervento presentato al convegno 27th International Conference on Pattern Recognition, ICPR 2024 tenutosi a Kolkata nel 2024) [10.1007/978-3-031-78189-6_25].

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Xing, Songlong;Peruzzo, Elia;Sangineto, Enver;Sebe, Nicu

2024-01-01

Abstract

Drawing analogies between two pairs of entities in the form of A:B::C:D (i.e. A is to B as C is to D) is a hallmark of human intelligence, as evidenced by sufficient findings in cognitive science for the last decades. In recent years, this property has been found far beyond cognitive science. Notable examples are word2vec and GloVe models in natural language processing. Recent research in computer vision also found the property of analogies in the feature space of a pretrained ConvNet feature extractor. However, analogy mining in the semantic space of recent strong foundation models such as CLIP is still understudied, despite the fact that they have been successfully applied to a wide range of downstream tasks. In this work, we show that CLIP possesses the similar ability of analogical reasoning in the latent space, and propose a novel strategy to extract analogies between pairs of images in the CLIP space. We compute all the difference vectors of a pair of any two images that belong to the same class in the CLIP space, and employ k-means clustering to group the difference vectors into clusters irrespective of their classes. This procedure results in cluster centroids representative of class-agnostic semantic analogies between images. Through extensive analysis, we show that the property of drawing analogies between images also exists in the CLIP space, which are interpretable by humans through a visualisation of the learned clusters.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Luogo di edizione (Place of publication)
	
				Heidelberg
			
	Casa editrice (Publisher)
	
				Springer Science and Business Media Deutschland GmbH
			
	ISBN
	
				9783031781889
9783031781896
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85213295230
			
	Tutti gli autori
	
						Xing, Songlong; Peruzzo, Elia; Sangineto, Enver; Sebe, Nicu
					
	Citazione
	
				From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space / Xing, Songlong; Peruzzo, Elia; Sangineto, Enver; Sebe, Nicu. - 15309 LNCS:(2024), pp. 383-399. (Intervento presentato al  convegno 27th International Conference on Pattern Recognition, ICPR 2024 tenutosi a Kolkata nel 2024) [10.1007/978-3-031-78189-6_25].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/442611

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

ND

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

From One to Many Lorikeets: Discovering Image Analogies in the CLIP Space

Xing, Songlong;Peruzzo, Elia;Sangineto, Enver;Sebe, Nicu

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)