Abstract:
Neural networks often excel when their inputs closely match the data on which they were trained, yet they frequently fail when inputs differ even slightly from their training data. This issue, known as distribution shift, remains a significant challenge when deploying machine learning models in practical applications such as medical imaging and autonomous driving. Traditional methods to address distribution shift typically involve additional training or data collection, which may not always be feasible for models already deployed. This thesis explores alternative strategies aimed at enhancing the robustness of already trained models to distribution shifts.
The first part of this work introduces a benchmark specifically designed to evaluate test-time adaptation (TTA) methods under prolonged and varied distribution shifts. Using this benchmark, we demonstrate that while existing TTA techniques initially improve performance, they often lead to performance degradation with extended adaptation. We also propose a simple baseline method capable of consistently outperforming other tested methods, maintaining high performance even throughout prolonged adaptation.
Building on these insights, the second part analyzes the underlying mechanisms of entropy-based loss functions commonly employed in TTA. We show that entropy minimization initially clusters embeddings of similar images together, thus increasing accuracy. However, continued entropy minimization eventually drives input image embeddings further away from training embeddings, thereby reducing accuracy. Leveraging this insight, we propose Weighted Flips (WF), a novel method capable of predicting model accuracy on arbitrary image sets without the need for labeled data.
The final part of this work extends the principles of TTA to language models (LMs), focusing on the task of literature recommendation. We propose a benchmark that evaluates LMs in their ability to infer academic papers when given a short description that references them. Our benchmark demonstrates that LMs are unable to effectively perform this task. Therefore, we propose a simple agent that allows LMs to search for and read relevant papers, significantly improving their performance.