Abstract:
As machine learning systems become increasingly complex and autonomous, the integration of uncertainty quantification becomes crucial, especially in high-stakes domains like healthcare and autonomous driving, where ambiguity can lead to severe consequences. By offering a clear gauge of prediction confidence, uncertainty quantification supports informed decision-making and risk management.
Within the realm of healthcare, where diagnostic procedures often depend on var- ious imaging modalities, modern machine-learning methods are being harnessed to aid diagnosis. Current advancements in generative machine learning explore the synthesis of different medical imaging modalities, predominantly through image-to-image translations. Our work demonstrates that integrating aleatoric uncertainty in Generative Adversarial Networks (GANs) for these translation tasks can amplify interpretability and accuracy. Consequently, this empowers healthcare professionals with better diagnostic and treatment decisions, thus enhancing patient outcomes.
In the context of autonomous driving and similar applications, ensuring resilience to unforeseen perturbations is vital. Traditional deterministic models may falter when confronted with new situations, constituting a safety hazard. We address this by implementing a probabilistic approach to dense computer vision tasks and utilizing the Likelihood Annealing technique for uncertainty estimation. These methods amplify the robustness to unexpected situations and provide a calibrated uncertainty measure, contributing to the development of safer autonomous systems.
While creating new probabilistic machine learning solutions for vital applications is a key research area, it’s equally significant to develop methods that leverage large-scale pretrained models. These deterministic models can be adapted to estimate uncertainties in a cost-efficient manner regarding data, computation, and other resources, a direction we explore in this thesis. The work presented herein addresses this issue within the context of current computer vision systems, including large-scale vision language models crucial for enabling intelligent multimodal systems.