Robust Out-of-Distribution Detection in Deep Classifiers

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/141438
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1414389
http://dx.doi.org/10.15496/publikation-82785
Dokumentart: Dissertation
Erscheinungsdatum: 2023-05-25
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Hein, Matthias (Prof. Dr.)
Tag der mündl. Prüfung: 2023-04-24
DDC-Klassifikation: 004 - Informatik
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en
Gedruckte Kopie bestellen: Print-on-Demand
Zur Langanzeige

Abstract:

Over the past decade, deep learning has gone from a fringe discipline of computer science to a major driver of innovation across a large number of industries. The deployment of such rapidly developing technology in safety-critical applications necessitates the careful study and mitigation of potential failure modes. Indeed, many deep learning models are overconfident in their predictions, are unable to flag out-of-distribution examples that are clearly unrelated to the task they were trained on and are vulnerable to adversarial vulnerabilities, where a small change in the input leads to a large change in the model’s prediction. In this dissertation, we study the relation between these issues in deep learning based vision classifiers. First, we benchmark various methods that have been proposed to enable deep learning meth- ods to detect out-of-distribution examples and we show that a classifier’s predictive confidence is well-suited for this task, if the classifier has had access to a large and diverse out-distribution at train time. We theoretically investigate how different out-of-distribution detection methods are related and show that several seemingly different approaches are actually modeling the same core quantities. In the second part we study the adversarial robustness of a classifier’s confidence on out- of-distribution data. Concretely, we show that several previous techniques for adversarial robustness can be combined to create a model that inherits each method’s strength while sig- nificantly reducing their respective drawbacks. In addition, we demonstrate that the enforce- ment of adversarially robust low confidence on out-of-distribution data enhances the inherent interpretability of the model by imbuing the classifier with certain generative properties that can be used to query the model for counterfactual explanations for its decisions. In the third part of this dissertation we will study the problem of issuing mathematically provable certificates for the adversarial robustness of a model’s confidence on out-of-distribution data. We develop two different approaches to this problem and show that they have comple- mentary strength and weaknesses. The first method is easy to train, puts no restrictions on the architecture that our classifier can use and provably ensures that the classifier will have low confidence on data very far away. However, it only provides guarantees for very specific types of adversarial perturbations and only for data that is very easy to distinguish from the in-distribution. The second approach works for more commonly studied sets of adversarial perturbations and on much more challenging out-distribution data, but puts heavy restrictions on the architecture that can be used and thus the achievable accuracy. It also does not guar- antee low confidence on asymptotically far away data. In the final chapter of this dissertation we show how ideas from both of these techniques can be combined in a way that preserves all of their strengths while inheriting none of their weaknesses. Thus, this thesis outlines how to develop high-performing classifiers that provably know when they do not know.

Das Dokument erscheint in: