Abstract:
Deep learning has become a dominant technique in machine learning and with only a little
exaggeration it became a synonym to machine learning itself. Despite its great performance on
tasks ranging from image classification to text generation, the underlying mechanism remains
largely not understood and the deep-learning systems are a black-box, yielding some undesirable
properties, such as the presence of adversarial examples. An adversarial example is a tiny
modification of an input (usually demonstrated for images) that is imperceivable to humans and
does not change the semantics of the input, however, the classifier is fooled and changes output to
some absurd value. This is of a major concern in safety-critical applications, such as for
autonomous driving where the consequence of such adversarial manipulations are potentially
catastrophic. In this thesis, we continue in the effort to mitigate the problem.
Namely, we first observe that the problem is not only present for deep learning, but also for simpler
classifiers, such as nearest prototype classifiers. In that case, we derive rigorous mathematical
guarantees about the robustness and provide tractable lower-bounds for the robustness. Despite
using simpler models, this allowed us to establish state-of-the-art results on a popular benchmark.
Later, we focus on randomized smoothing, which is a method certifying the robustness of a
classifier to adversarial perturbations. In simple terms, in randomized smoothing we add noise to the
input many times and output the majority vote over the outputs of the classifier for the noisy inputs.
We present three separate contributions to this topic. First, we show that the standard
implementation of this procedure does not actually yield the guarantees due to floating point errors
and develop a fix. Second, we improve the performance of this technique in a certain setting. Third,
we speed up the certification procedure.