
Enhancing Zero-Shot Learning with CLIP for Multi-Distribution Mammogram Classification
Abstract
Breast cancer, the most prevalent cancer among women globally, relies on mammography for early detection, significantly reducing mortality rates. The Breast Imaging Reporting and Data System (BI-RADS) standardizes classification of mammograms into normal, benign, or malignant categories, aiding diagnosis. However, existing vision-language models (VLMs) struggle with generalization across diverse datasets due to imaging variations. This thesis introduces the Multi-Distribution Mammogram Classification with Contrastive Language-Image Pre-training (MDMC-CLIP) framework, enhancing the OpenAI CLIP model to improve accuracy and cross-dataset generalization in mammography. By integrating multi-dataset training (INbreast, MIAS, VinDr-Mammo, KAU-BCMD, CMMD) and enriched prompts, MDMC-CLIP captures dataset-specific diversity, enabling differentiation of subtle inter-dataset variations alongside standard classifications. Latent prompt extraction refines image descriptions, boosting fine-grained understanding. Zero-shot experiments demonstrate MDMC-CLIP’s superiority over baseline VLMs, showcasing enhanced accuracy and adaptability to unseen datasets. This research advances AI-driven mammography, offering a robust tool for early breast cancer detection with broad clinical potential.