Robust and Efficient Deep Visual Learning

DSpace Repository


Dokumentart: Dissertation
Date: 2020-12-16
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Gehler, Peter Vincent (Dr.)
Day of Oral Examination: 2020-12-02
DDC Classifikation: 004 - Data processing and computer science
Keywords: Deep learning , Maschinelles Lernen , Dimension 3 , Avatar <Informatik>
Other Keywords:
machine learning
computer vision
computer graphics
License: Publishing license including print on demand
Order a printed copy: Print-on-Demand
Show full item record


The past decade was marked by significant progress in the field of artificial intelligence and statistical learning. However, the most impressive of modern models come in the form of computationally expensive black boxes, with the majority of them lacking the ability to reason about the confidence of their predictions robustly. Being capable of quantifying model uncertainty and recognizing failure scenarios is crucial when it comes to incorporating them into complex decision-making pipelines, e.g. autonomous driving or medical image analysis systems. It is also important to maintain a low computational cost of these models. In the present thesis, the aforementioned desired properties of robustness and efficiency of deep learning models are studied and developed in the three specific realms of computer vision. First, we investigate deep probabilistic models that allow uncertainty quantification, i.e. the models that "know what they do not know". Here, we propose a novel model for the task of angular regression that allows probabilistic object pose estimation from 2D images. We also showcase how the general deep density estimation paradigm can be adapted and utilized in two other real-world applications, ball trajectory prediction and brain imaging. Next, we turn to the field of 3D shape analysis and rendering. We propose a method for efficient encoding of 3D point clouds, the type of data that is hard to handle with conventional learning algorithms due to its unordered nature. We show that simple neural networks that use the developed encoding as input can match the performance of state-of-the-art methods on various point cloud processing tasks while using orders of magnitude less floating-point operations. Finally, we explore the emerging field of neural rendering and develop the framework that connects classic deformable 3D body models with modern image-to-image translation neural networks. This combination allows efficient photorealistic human avatar rendering in a controlled manner, with the possibility to control the camera flexibly and to change the body pose and shape appearance. The thesis concludes with the discussion of the presented methods, including current limitations and future research directions.

This item appears in the following Collection(s)