Efficient Processing and Learning on Unstructured Data

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/162897
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1628979
http://dx.doi.org/10.15496/publikation-104228
Dokumentart: Dissertation
Erscheinungsdatum: 2025-03-12
Originalveröffentlichung: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of the University of Tuebingen's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Lensch, Hendrik P. A. (Prof. Dr.)
Tag der mündl. Prüfung: 2025-02-07
DDC-Klassifikation: 004 - Informatik
Schlagworte: Deep Learning , Maschinelles Lernen , Informatik , Maschinelles Sehen , Computergrafik
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Inferring knowledge by visually perceiving the world is the fundamental goal of computer vision. The primary input types are images and videos, particularly in the era of deep learning. Such structured data representations are defined on a fixed grid, implicitly assigning positional information to each data point. Hence, finding neighbors is a simple lookup; data must be defined everywhere by the same resolution. On the other hand, unstructured data representations like point clouds, feature spaces, or graphs do not have such an underlying structure. However, they can efficiently embed information from multiple sources into a common representation, leveraging knowledge and resolutions otherwise impossible to achieve from a single source. In particular, higher dimensional data is infeasible to represent in a densely structured way. Unfortunately, losing the implicit grid structure renders many algorithms fundamental to computer vision unusable. In particular, the advancements in vision-based deep learning heavily depend on the highly efficient frameworks providing implementations and routines only defined on structured data. This thesis highlights the importance of unstructured data representations and presents efficient processing and learning techniques in this context. Four projects are forming the core of this work, demonstrating different aspects: 1) A high-precision 3D reconstruction scheme with active illumination. 2) A framework to enable fully-convolutional deep learning on point cloud data. 3) An end-to-end Fisher Vector embedding for object recognition. 4) A superfast proximity graph construction and query for nearest neighbor search on GPUs. It is shown that Efficient Processing and Learning on Unstructured Data is possible when algorithms are designed from the ground up in a massively parallelizable fashion and domain knowledge is incorporated. Generally, high efficiency is achieved by scaling up the simple but robust and effective methods through GPUs. The presented projects are evaluated against large-scale, real-world data to showcase their robustness, performance, and application capabilities.

Das Dokument erscheint in: