Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization

IRIS

In this paper, we study the cross-view geo-localization problem to match images from different viewpoints. The key motivation underpinning this task is to learn a discriminative viewpoint-invariant visual representation. Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network. Specifically, we introduce a Unit Subtraction Attention Module (USAM) that can automatically discover representative keypoints from feature maps and draw attention to the salient regions. USAM contains very few learning parameters but yields significant performance improvement and can be easily plugged into different networks. We demonstrate through extensive experiments that (1) by incorporating USAM, RK-Net facilitates end-to-end joint learning without the prerequisite of extra annotations. Representation learning and keypoint detection are two highly-related tasks. Representation learning aids keypoint detection. Keypoint detection, in turn, enriches the model capability against large appearance changes caused by viewpoint variants. (2) USAM is easy to implement and can be integrated with existing methods, further improving the state-of-the-art performance. We achieve competitive geo-localization accuracy on three challenging datasets, i. e., University-1652, CVUSA and CVACT. Our code is available at https://github.com/AggMan96/RK-Net.

Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization / Lin, J.; Zheng, Z.; Zhong, Z.; Luo, Z.; Li, S.; Yang, Y.; Sebe, N.. - In: IEEE TRANSACTIONS ON IMAGE PROCESSING. - ISSN 1057-7149. - 31:(2022), pp. 3780-3792. [10.1109/TIP.2022.3175601]

Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization

Lin J.;Zheng Z.;Zhong Z.;Luo Z.;Li S.;Yang Y.;Sebe N.

2022-01-01

Abstract

In this paper, we study the cross-view geo-localization problem to match images from different viewpoints. The key motivation underpinning this task is to learn a discriminative viewpoint-invariant visual representation. Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network. Specifically, we introduce a Unit Subtraction Attention Module (USAM) that can automatically discover representative keypoints from feature maps and draw attention to the salient regions. USAM contains very few learning parameters but yields significant performance improvement and can be easily plugged into different networks. We demonstrate through extensive experiments that (1) by incorporating USAM, RK-Net facilitates end-to-end joint learning without the prerequisite of extra annotations. Representation learning and keypoint detection are two highly-related tasks. Representation learning aids keypoint detection. Keypoint detection, in turn, enriches the model capability against large appearance changes caused by viewpoint variants. (2) USAM is easy to implement and can be integrated with existing methods, further improving the state-of-the-art performance. We achieve competitive geo-localization accuracy on three challenging datasets, i. e., University-1652, CVUSA and CVACT. Our code is available at https://github.com/AggMan96/RK-Net.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				IEEE TRANSACTIONS ON IMAGE PROCESSING
			
	DOI
	
				https://dx.doi.org/10.1109/TIP.2022.3175601
			
	Codice PubMed (PubMed Identifier)
	
				35604972
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85130811940
			
	Codice WOS (WOS identifier)
	
				WOS:000805798900003
			
	Tutti gli autori
	
						Lin, J.; Zheng, Z.; Zhong, Z.; Luo, Z.; Li, S.; Yang, Y.; Sebe, N.
					
	Citazione
	
				Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization / Lin, J.; Zheng, Z.; Zhong, Z.; Luo, Z.; Li, S.; Yang, Y.; Sebe, N.. - In: IEEE TRANSACTIONS ON IMAGE PROCESSING. - ISSN 1057-7149. - 31:(2022), pp. 3780-3792. [10.1109/TIP.2022.3175601]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
JointRepresentation2-TIP22.pdf Solo gestori archivio Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Altra licenza (Other type of license) Dimensione 10.24 MB Formato Adobe PDF Visualizza/Apri	10.24 MB	Adobe PDF	Visualizza/Apri
Joint_Representation_Learning_and_Keypoint_Detection_for_Cross-View_Geo-Localization.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.56 MB Formato Adobe PDF Visualizza/Apri	6.56 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/361028

Citazioni

1

77

57

ND

social impact