Abstract:
In recent years, the field of deep learning has witnessed remarkable progress, driving the integration of these technologies into a wide range of real-world applications. This growing adoption of Deep Neural Networks (DNNs) is underpinned by an increasing confidence in their reliability and ability to generalize effectively across diverse tasks.
However, the ability of the models to generalize to previously unseen inputs is poorly understood and frequently lacking, especially in smaller models.
Generalization either has to be built into the architecture or it has to be learned from data.
The former is limited by what we as developers can imagine, the latter is very costly and also limited by the quality of data available.
Therefore it is desirable to consider systems that already have generalizing properties, especially on a representational level, and transfer those to a target system.
As doing so would only make sense if the architecture is different between the two, the only option is to perform this transfer based on functional activity, i.e. to use functional transfer methods.
However, although there are numerous such methods proposed in the literature, it is unclear which of those are actually useful for transferring generalization.
Thus, we aim with this thesis to understand how generalization generally manifests in learning systems and how it can be transferred specifically to artificial neural networks.
We approach both points by considering application areas with functional transfer on two different sources for the generalizing representations.
In the first case, we use the recorded neuronal activity from the visual cortex of a macaque monkey as the teacher and successfully transfer generalization properties to a standard DNN student.
This is achieved by training the DNN to jointly predict the monkey's neuronal responses with the actual classification task.
We also introduce the attention readout architecture, a novel method for predicting neuron responses from DNN representations. This approach surpasses the current state-of-the-art in predicting responses in macaque area V4.
This enables not only a deeper understanding of the visual cortex through more accurate in-silico experiments but also better functional transfer from neuronal activity in the future.
In the second case, the teacher is a DNN that has either learned or built-in properties that allow it to generalize better and thus are desirable to transfer to the student DNN.
For this setting, we initially investigate fundamental transfer abilities for clearly defined invariances in a small, controlled environment with theoretical and empirical methods.
The results show that established transfer methods cannot reliably transfer even simple invariances.
Aiming to close this gap, we propose a novel method we call Orbit, as a method to capture and transfer invariance from teacher to student and demonstrate that it successfully solves this problem in our controlled environment.
Building on the insights gained from Orbit and the corresponding analysis, we further propose a general framework with "Hard Augmentations for Robust Distillation (HARD)" that extends existing functional transfer methods to handle invariances and other generalizing properties.
We demonstrate its effectiveness beyond small-scale examples by outperforming state-of-the-art transfer methods on several tasks.
Overall, we take a step towards understanding how generalization works and offer much insight into its transfer through functional methods.
We reveal for the first time that traditional functional transfer methods are insufficient when it comes to the transfer of generalization and offer three new methods that successfully transfer invariance from artificial neural networks (Orbit) or robustness from neuronal data (Co-Training), as well as one more general method that can be applied generally for any setting (HARD).