Why Machine Learning Models Fail: A Benchmarking Perspective

DSpace Repositorium (Manakin basiert)

Zur Kurzanzeige

dc.contributor.advisor Bethge, Matthias (Prof. Dr.)
dc.contributor.author Michaelis, Claudio
dc.date.accessioned 2024-04-10T09:00:32Z
dc.date.available 2024-04-10T09:00:32Z
dc.date.issued 2024-04-10
dc.identifier.uri http://hdl.handle.net/10900/152732
dc.identifier.uri http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1527324 de_DE
dc.identifier.uri http://dx.doi.org/10.15496/publikation-94071
dc.description.abstract Over the last years, machine performance at object recognition, language understanding and other capabilities that we associate with human intelligence has rapidly improved. One central element of this progress are machine learning models that learn the solution for a task directly from data. The other are benchmarks that use data to quantitatively measure model performance. In combination, they form a virtuous cycle where models can be optimized directly on benchmark performance. But while the resulting models perform very well on their benchmarks, they often fail unexpectedly outside the controlled setting. Innocuous changes such as image noise, rain or the wrong background can lead to wrong predictions. In this dissertation, I argue that to understand these failures, it is necessary to understand the relationship between benchmark performance and the desired capability. To support this argument, I study benchmarks in two ways. In the first part, I investigate how to learn and evaluate a new capability. Therefore, I introduce one-shot object detection and define different benchmarks to analyze what makes this task hard for machine learning models and what is needed to solve it. I find that CNNs struggle to separate individual objects in cluttered environments, and that one-shot recognition of objects from novel categories can be challenging with real-world objects. I then continue to investigate what makes one-shot generalization difficult in real-world scenes, and identify the number of categories in the training dataset as the central factor. Using this insight, I show that excellent one-shot generalization can be achieved by training on broader datasets. These results highlight how much benchmark design influences what is measured, and that limitations in benchmarks can be confused for limitations of the models developed with them. In the second part, I broaden the view and analyze the connection between model failures in different areas of machine learning. I find that many of these failures can be explained by shortcut learning, models exploiting a mismatch between a benchmark and its associated capability. Shortcut solutions use superficial cues that work very well within the training domain, but are unrelated to the capability. This demonstrates that good benchmarks performance is not sufficient to prove that a model acquired the associated capability, and that results have to be interpreted carefully. Taken together, these findings put in question the common practice of evaluating models on a single, or at maximum a few, benchmarks. Rather, my results indicate that to anticipate model failures, it is essential to measure broadly. And to avoid them, it is necessary to verify that models acquire the desired capability. This will require investment into better data, new benchmarks and other complementary forms of evaluation, but provides the basis for further progress towards powerful, reliable and safe models. en
dc.language.iso en de_DE
dc.publisher Universität Tübingen de_DE
dc.rights ubt-podno de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en en
dc.subject.classification Maschinelles Lernen , Maschinelles Sehen , de_DE
dc.subject.ddc 004 de_DE
dc.subject.other machine learning en
dc.subject.other deep learning en
dc.subject.other computer vision en
dc.subject.other benchmarking en
dc.title Why Machine Learning Models Fail: A Benchmarking Perspective en
dc.type PhDThesis de_DE
dcterms.dateAccepted 2023-12-19
utue.publikation.fachbereich Informatik de_DE
utue.publikation.fakultaet 7 Mathematisch-Naturwissenschaftliche Fakultät de_DE
utue.publikation.noppn yes de_DE

Dateien:

Das Dokument erscheint in:

Zur Kurzanzeige