Full TitleDeep multimodal Learning for breast CAncer DETection
Breast cancer is the most common cancer in women worldwide, but it has also a very successful survival rate when diagnosed early through annual screening. This comes at the cost of a large number of radiologist hours spent on evaluating mammograms. Artificial intelligent systems are being developed to improve the efficiency and effectiveness of the triage process. Deep learning (DL) has been hugely successful in a wide range of applications in computer vision including medical image analysis, but training deep convolutional neural networks (CNNs) typically requires large labeled datasets. This is an obstacle for using deep CNNs with medical imaging since it is particularly expensive to obtain large amounts of data and labels. Hence, on one hand CNNs are the state-of-the-art approach for breast cancer detection, but at the same time they pose demanding data requirements. This project brings together a group of scientists with background in AI and medical imaging to advance the application of state-of-the-art CNNs for breast cancer detection in realistic scenarios. We aim to relax the current data and label requirements for training DL nets through a new set of methods that are at the intersection of multiscale learning, self-supervised learning and multimodal learning. First, we ground our approach in multiscale learning, which has shown to be a data efficient way to train neural networks for the generation of natural images, and to improve performance in classification. Our plan is to use an image at different resolutions to learn features at a coarser and a detailed level, thus giving a richer perspective of the data, and simultaneously augmenting the dataset without additional samples. Our hypothesis is that by learning at different resolutions, the network learns a more difficult problem as the resolution increases, and this provides a regularization effect of the learning process that prevents overfitting, while using the same amount of original examples. We will look at multiscale learning from two perspectives: Task 2 will follow a generative approach, and Task 3 will follow a discriminative approach. Both approaches are unsupervised. In Task 2 we will test the hypothesis that by learning to generate new images that look like true datasetlike images, the network will learn good representations that are useful for a downstream task. The multiscale approach is embedded in the generation process, as the model learns to generate increasingly higher resolution images. In task 3 we will test the hypothesis that by learning a metric space that minimizes a distance of patches that contain lesions and maximizes patches that have nothing in common, the network will learn useful features for a downstream task. Next, we observe that the combination of 2D mammography and 3D tomosynthesis increases accuracy and decreases false positive recalls, but it takes much longer to analyse. Task 4 will develop a method for using 3D convolutions for 3D volumes of tomosynthesis. Tasks 5 and 6 will explore multimodal learning to train a network that is able to exploit the complementary information of these modalities. Task 5 will investigate how to combine information coming from multiple modalities in the architecture of the model, either by combining them in the input and first layer – early fusion, by combining at intermediate layers of the models – slow fusion, or combining the last feature representations – late fusion. Finally, Task 6 will investigate the following research question: can we transfer knowledge from a network trained using 3D tomosynthesis to a network learning from 2D mammography? This approach is inspired by the teacher-student framework of knowledge distillation. Having pairs of <3D tomosynthesis, 2D mammography> and respective labels, we can train a network that uses the first modality to predict the labels of the second. After this, we can use these predictions as an additional target to transfer knowledge across modalities. The 2D mammography network will learn from two supervision signals, instead of just the ground truth. This has shown to be effective in two ways: first, by regularizing the learning, the network can learn with less data, which is our main goal; second, it improves the performance of the second network by leveraging the knowledge distillation of the other modality. Addressing the data limitations of CNNs reduces costs of data acquisition and labeling, leading to: 1. faster development of new CNNs methods; 2. cheaper deployment of CNNs in real applications; 3. the democratization of access to CNNs for smaller health units; 4. faster adaptation of an existing system to a new reality, e.g. a new imaging sensor or the emergence of a new disease. Integrating deep CNNs with medical diagnostic processes can have a great impact in society. This proposal is a step forward indirection.