It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel cross MLP-Mixer blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem in the cross-view translation task in a much simple fashion for better network optimization. Extensive experimental results on Dayton~cite{vo2016localizing} and CVUSA~cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code, data, and trained models are available later.
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation / Ren, Bin; Tang, Hao; Sebe, Nicu. - (2021), pp. 1-14. (Intervento presentato al convegno 32nd British Machine Vision Conference, BMVC 2021 tenutosi a online nel 22nd-25th November 2021).
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
Ren, Bin;Tang, Hao;Sebe, Nicu
2021-01-01
Abstract
It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel cross MLP-Mixer blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem in the cross-view translation task in a much simple fashion for better network optimization. Extensive experimental results on Dayton~cite{vo2016localizing} and CVUSA~cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code, data, and trained models are available later.File | Dimensione | Formato | |
---|---|---|---|
0141.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
5.47 MB
Formato
Adobe PDF
|
5.47 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione