In order to interact with the real world, humans need to perform several tasks such as object detection, pose estimation, motion estimation and distance estimation. These tasks are all part of scene understanding and are fundamental tasks of computer vision. Depth estimation received unprecedented attention from the research community in recent years due to the growing interest in its practical applications (ie robotics, autonomous driving, etc.) and the performance improvements achieved with deep learning. In fact, the applications expanded from the more traditional tasks such as robotics to new fields such as autonomous driving, augmented reality devices and smartphones applications. This is due to several factors. First, with the increased availability of training data, bigger and bigger datasets were collected. Second, deep learning frameworks running on graphical cards exponentially increased the data processing capabilities allowing for higher precision deep convolutional networks, ConvNets, to be trained. Third, researchers applied unsupervised optimization objectives to ConvNets overcoming the hurdle of collecting expensive ground truth and fully exploiting the abundance of images available in datasets. This thesis addresses several proposals and their benefits for unsupervised depth estimation, i.e., (i) learning from resynthesized data, (ii) adversarial learning, (iii) coupling generator and discriminator losses for collaborative training, and (iv) self-improvement ability of the learned model. For the first two points, we developed a binocular stereo unsupervised depth estimation model that uses reconstructed data as an additional self-constraint during training. In addition to that, adversarial learning improves the quality of the reconstructions, further increasing the performance of the model. The third point is inspired by scene understanding as a structured task. A generator and a discriminator joining their efforts in a structured way improve the quality of the estimations. Our intuition may sound counterintuitive when cast in the general framework of adversarial learning. However, in our experiments we demonstrate the effectiveness of the proposed approach. Finally, self-improvement is inspired by estimation refinement, a widespread practice in dense reconstruction tasks like depth estimation. We devise a monocular unsupervised depth estimation approach, which measures the reconstruction errors in an unsupervised way, to produce a refinement of the depth predictions. Furthermore, we apply knowledge distillation to improve the student ConvNet with the knowledge of the teacher ConvNet that has access to the errors.
Learning Unsupervised Depth Estimation, from Stereo to Monocular Images / Pilzer, Andrea. - (2020 Jun 22), pp. 1-105. [10.15168/11572_268252]
Learning Unsupervised Depth Estimation, from Stereo to Monocular Images
Pilzer, Andrea
2020-06-22
Abstract
In order to interact with the real world, humans need to perform several tasks such as object detection, pose estimation, motion estimation and distance estimation. These tasks are all part of scene understanding and are fundamental tasks of computer vision. Depth estimation received unprecedented attention from the research community in recent years due to the growing interest in its practical applications (ie robotics, autonomous driving, etc.) and the performance improvements achieved with deep learning. In fact, the applications expanded from the more traditional tasks such as robotics to new fields such as autonomous driving, augmented reality devices and smartphones applications. This is due to several factors. First, with the increased availability of training data, bigger and bigger datasets were collected. Second, deep learning frameworks running on graphical cards exponentially increased the data processing capabilities allowing for higher precision deep convolutional networks, ConvNets, to be trained. Third, researchers applied unsupervised optimization objectives to ConvNets overcoming the hurdle of collecting expensive ground truth and fully exploiting the abundance of images available in datasets. This thesis addresses several proposals and their benefits for unsupervised depth estimation, i.e., (i) learning from resynthesized data, (ii) adversarial learning, (iii) coupling generator and discriminator losses for collaborative training, and (iv) self-improvement ability of the learned model. For the first two points, we developed a binocular stereo unsupervised depth estimation model that uses reconstructed data as an additional self-constraint during training. In addition to that, adversarial learning improves the quality of the reconstructions, further increasing the performance of the model. The third point is inspired by scene understanding as a structured task. A generator and a discriminator joining their efforts in a structured way improve the quality of the estimations. Our intuition may sound counterintuitive when cast in the general framework of adversarial learning. However, in our experiments we demonstrate the effectiveness of the proposed approach. Finally, self-improvement is inspired by estimation refinement, a widespread practice in dense reconstruction tasks like depth estimation. We devise a monocular unsupervised depth estimation approach, which measures the reconstruction errors in an unsupervised way, to produce a refinement of the depth predictions. Furthermore, we apply knowledge distillation to improve the student ConvNet with the knowledge of the teacher ConvNet that has access to the errors.File | Dimensione | Formato | |
---|---|---|---|
thesis_pilzer_final_compressed.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
5.76 MB
Formato
Adobe PDF
|
5.76 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione