Certified Adversarial Robustness With Domain Constraints

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/161636
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1616366
http://dx.doi.org/10.15496/publikation-102968
Dokumentart: Dissertation
Erscheinungsdatum: 2025-02-06
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Hein, Matthias (Prof. Dr.)
Tag der mündl. Prüfung: 2025-01-24
DDC-Klassifikation: 004 - Informatik
Schlagworte: Maschinelles Lernen
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Deep learning has become a dominant technique in machine learning and with only a little exaggeration it became a synonym to machine learning itself. Despite its great performance on tasks ranging from image classification to text generation, the underlying mechanism remains largely not understood and the deep-learning systems are a black-box, yielding some undesirable properties, such as the presence of adversarial examples. An adversarial example is a tiny modification of an input (usually demonstrated for images) that is imperceivable to humans and does not change the semantics of the input, however, the classifier is fooled and changes output to some absurd value. This is of a major concern in safety-critical applications, such as for autonomous driving where the consequence of such adversarial manipulations are potentially catastrophic. In this thesis, we continue in the effort to mitigate the problem. Namely, we first observe that the problem is not only present for deep learning, but also for simpler classifiers, such as nearest prototype classifiers. In that case, we derive rigorous mathematical guarantees about the robustness and provide tractable lower-bounds for the robustness. Despite using simpler models, this allowed us to establish state-of-the-art results on a popular benchmark. Later, we focus on randomized smoothing, which is a method certifying the robustness of a classifier to adversarial perturbations. In simple terms, in randomized smoothing we add noise to the input many times and output the majority vote over the outputs of the classifier for the noisy inputs. We present three separate contributions to this topic. First, we show that the standard implementation of this procedure does not actually yield the guarantees due to floating point errors and develop a fix. Second, we improve the performance of this technique in a certain setting. Third, we speed up the certification procedure.

Das Dokument erscheint in: