
Thesis Format
Integrated Article
Degree
Master of Engineering Science
Program
Electrical and Computer Engineering
Supervisor
Yang, Yimin
Abstract
Breast cancer, the most prevalent cancer among women globally, relies on mammography for early detection, significantly reducing mortality rates. The Breast Imaging Reporting and Data System (BI-RADS) standardizes classification of mammograms into normal, benign, or malignant categories, aiding diagnosis. However, existing vision-language models (VLMs) struggle with generalization across diverse datasets due to imaging variations. This thesis introduces the Multi-Distribution Mammogram Classification with Contrastive Language-Image Pre-training (MDMC-CLIP) framework, enhancing the OpenAI CLIP model to improve accuracy and cross-dataset generalization in mammography. By integrating multi-dataset training (INbreast, MIAS, VinDr-Mammo, KAU-BCMD, CMMD) and enriched prompts, MDMC-CLIP captures dataset-specific diversity, enabling differentiation of subtle inter-dataset variations alongside standard classifications. Latent prompt extraction refines image descriptions, boosting fine-grained understanding. Zero-shot experiments demonstrate MDMC-CLIP’s superiority over baseline VLMs, showcasing enhanced accuracy and adaptability to unseen datasets. This research advances AI-driven mammography, offering a robust tool for early breast cancer detection with broad clinical potential.
Summary for Lay Audience
Breast cancer is the most common cancer affecting women worldwide, and detecting it early is crucial for better survival chances. Mammography, an X-ray technique that creates detailed images of the breast, helps doctors identify whether the tissue is normal, benign (non-cancerous), or malignant (cancerous). However, the images can vary depending on the equipment or techniques used in different hospitals, making it tricky for computer programs to analyze them accurately—especially when faced with new, unfamiliar data. To address this, we developed a new method called MDMC-CLIP, which builds on a cutting-edge technology called a vision-language model. This technology combines the ability to understand images and text, allowing the model to learn from both. We trained MDMC-CLIP using mammogram images from various sources worldwide and added specific text descriptions for each image, such as “This is the mammogram from the dataset A reveals the presence of malignant findings characterized by one or more areas suggestive of the cancerous growth in the breast tissue.” This helps the model recognize detailed differences and perform better. Our tests, conducted on publicly available mammogram collections from different countries and time periods, showed that MDMC-CLIP outperforms other methods. It accurately identifies normal, benign, and malignant cases and adapts well to new datasets it hasn’t seen before. This improvement comes from using multiple datasets and tailored text descriptions, which help the model handle variations in image quality or patient characteristics. This work could make a big difference for doctors by providing a more reliable tool to detect breast cancer early. Early detection can lead to timely treatment, improving patient outcomes and potentially saving lives. However, challenges remain, such as differences in image quality or the lack of certain image types in some datasets, which we plan to address in future research to make the tool even better for everyday use in clinics.
Recommended Citation
Yan, Zhichen, "Enhancing Zero-Shot Learning with CLIP for Multi-Distribution Mammogram Classification" (2025). Electronic Thesis and Dissertation Repository. 10816.
https://ir.lib.uwo.ca/etd/10816
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.