Abstract:
Diagnosis and treatment of cancer is a challenging endeavor in which surgeons and pathologists work closely together to ensure an accurate and comprehensive assessment of the disease. Advances in medical technology are leading to new diagnostic and therapeutic methods, such as minimally invasive surgery and computer-assisted pathology. However, these methodologies require intensive training of physicians and pose associated challenges. Learning-based approaches can assist physicians in this regard, making procedures more efficient and thus helping to improve cancer treatment. However, a major obstacle is the restricted data availability. This work aims to develop new methods that support surgeons and pathologists in their work and achieve the best possible results despite a scarcity of data. The work here focuses on applications in the context of bladder and breast cancer diagnostics.
A primary emphasis of this work is the prediction of depth maps in the context of cystoscopic examinations. In this minimally invasive procedure, the surgeon uses a monocular endoscope to look into the bladder. This approach limits the visual perception of the surgeon and makes it difficult to fully capture the bladder wall. In addition, it is not possible to acquire ground truth information – a prerequisite for learning-based approaches. As a solution, a three-step approach is presented. The basis is a virtual cystoscopy environment, for the acquisition of synthetic data including ground truth. Subsequently, a network is trained based on the acquired synthetic data set using a supervised learning strategy. In a third step, the knowledge, immanent to the network, is made usable for real images by means of adversarial domain adaptation. This approach shows promising results, which pave the way for image-guided surgery.
In the further course of this work, the focus is on histopathological image analysis, which is the most essential assessment in cancer diagnostics and is based on gigapixel images of digitized tissue sections. Pixel-precise annotation of such large images is extremely costly, whereas global ground truth labels, such as disease grade, are readily available because they are acquired in the context of clinical routine. The second part of the thesis, therefore, focuses on making these global labels usable and presents a framework based on multiple instance learning. This approach combines dynamic meta-embedding with an architecture trained by self-distillation. This design exhibits great potential for assisted cancer diagnosis and creates the possibility to capture relevant sub-cellular features with a single diagnostic label at the patient level, which enables to harness large amounts of data with low annotation efforts.