It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel cross MLP-Mixer blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem in the cross-view translation task in a much simple fashion for better network optimization. Extensive experimental results on Dayton~cite{vo2016localizing} and CVUSA~cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code, data, and trained models are available later.

Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation / Ren, Bin; Tang, Hao; Sebe, Nicu. - (2021), pp. 1-14. (Intervento presentato al convegno 32nd British Machine Vision Conference, BMVC 2021 tenutosi a online nel 22nd-25th November 2021).

Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

Ren, Bin;Tang, Hao;Sebe, Nicu
2021-01-01

Abstract

It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel cross MLP-Mixer blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem in the cross-view translation task in a much simple fashion for better network optimization. Extensive experimental results on Dayton~cite{vo2016localizing} and CVUSA~cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code, data, and trained models are available later.
2021
British Machine Vision Conference (BMVC’21)
Durham, UK
British Machine Vision Association, BMVA
Ren, Bin; Tang, Hao; Sebe, Nicu
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation / Ren, Bin; Tang, Hao; Sebe, Nicu. - (2021), pp. 1-14. (Intervento presentato al convegno 32nd British Machine Vision Conference, BMVC 2021 tenutosi a online nel 22nd-25th November 2021).
File in questo prodotto:
File Dimensione Formato  
0141.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 5.47 MB
Formato Adobe PDF
5.47 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/326198
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact