Radiology sits at the intersection of imaging technology and clinical decision-making, making it one of the most data-rich and AI-ready specialties in medicine. From chest X-rays to brain MRIs, from abdominal CTs to mammograms, the volume and variety of radiological imaging data is staggering. And thanks to a growing commitment to open science, an impressive collection of these datasets is now freely available to researchers worldwide.
This guide catalogs every major open source radiology dataset available for AI research, organized by anatomical region and imaging modality. Each entry includes direct links to download portals, GitHub repositories, and associated publications. This is designed to be a living reference for anyone building AI systems for radiology.
Why Radiology Leads in Open Data
Radiology was among the first medical specialties to embrace digital formats, with DICOM becoming the universal standard for medical imaging decades ago. This digital-native foundation, combined with the visual pattern recognition demands of the specialty, has made radiology the testing ground for medical AI. Large-scale NIH-funded projects like The Cancer Imaging Archive (TCIA) and mandates for data sharing in federally funded research have further accelerated dataset availability.
Chest Radiology Datasets
Chest imaging — both X-ray and CT — represents the largest category of open radiology data, driven by the global burden of pulmonary disease and the COVID-19 pandemic.
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| CheXpert | Chest X-ray | 224,316 images | 14 pathology labels with uncertainty modeling | Stanford ML Group |
| MIMIC-CXR-JPG | Chest X-ray | 377,110 images | Free-text reports + CheXpert-style labels | PhysioNet |
| NIH ChestX-ray14 | Chest X-ray | 112,120 images | 14 disease labels, bounding boxes for 880 images | NIH Clinical Center |
| VinDr-CXR | Chest X-ray | 18,000 images | 22 lesion categories with bounding boxes from 17 radiologists | PhysioNet |
| PadChest | Chest X-ray | 160,868 images | 174 radiographic findings, 19 differential diagnoses | BIMCV |
| RSNA Pneumonia Detection | Chest X-ray | 30,000 images | Bounding boxes for pneumonia opacities | Kaggle |
| LUNA16 | Chest CT | 888 scans | Lung nodule locations from LIDC-IDRI | Grand Challenge |
| LIDC-IDRI | Chest CT | 1,018 scans | Multi-reader nodule annotations with malignancy ratings | TCIA |
| NLST (National Lung Screening Trial) | Low-dose CT | 75,000+ scans | Lung cancer screening outcomes | NCI CDAS |
Neuroradiology Datasets
Brain imaging datasets support research in tumor detection, stroke assessment, neurodegenerative disease tracking, and normal brain development.
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| BraTS 2023/2024 | Multi-parametric MRI | 2,000+ cases | Glioma segmentation (enhancing, core, whole tumor, edema) | Synapse |
| ADNI | MRI, PET | 2,000+ subjects | Longitudinal Alzheimer’s imaging with cognitive scores | ADNI |
| CQ500 | Head CT | 491 scans | Intracranial hemorrhage, mass effect, midline shift, fractures | qure.ai |
| RSNA Intracranial Hemorrhage | Head CT | 25,000+ exams | 5 hemorrhage subtypes + normal | Kaggle |
| ATLAS (Anatomical Tracings of Lesions After Stroke) | T1w MRI | 1,271 scans | Manual stroke lesion tracings | NITRC |
| OpenNeuro | MRI, EEG, MEG | 900+ datasets | BIDS-formatted neuroscience datasets | openneuro.org |
| HCP (Human Connectome Project) | MRI (structural, functional, diffusion) | 1,200 subjects | High-resolution brain connectivity maps | humanconnectome.org |
Abdominal Radiology Datasets
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| TotalSegmentator | CT | 1,204 scans | 117 anatomical structures | GitHub |
| AbdomenAtlas 1.1 | CT | 9,262 volumes | 25 organs + tumors | GitHub |
| AMOS 2022 | CT + MRI | 500 CT + 100 MRI | 15 abdominal organs | Grand Challenge |
| LiTS | CT | 201 scans | Liver + liver tumor segmentation | CodaLab |
| KiTS23 | CT | 599 scans | Kidney + kidney tumor segmentation | GitHub |
| BTCV (Beyond the Cranial Vault) | CT | 50 scans | 13 abdominal organ segmentations | Synapse |
| WORD | CT | 150 scans | 16 abdominal organ segmentations | GitHub |
Mammography and Breast Imaging
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| VinDr-Mammo | Mammography | 5,000 exams (20,000 images) | BI-RADS assessment + findings with bounding boxes | PhysioNet |
| CBIS-DDSM | Mammography | 2,620 scans | Mass and calcification annotations with pathology-confirmed labels | TCIA |
| INbreast | Full-field digital mammography | 410 images | Contour annotations for masses, calcifications, and distortions | INESC Porto |
| RSNA Screening Mammography | Mammography | 54,706 images | Cancer detection labels | Kaggle |
| Duke Breast Cancer MRI | Breast MRI (DCE) | 922 patients | Pre-operative MRI with clinical and genomic data | TCIA |
Musculoskeletal Radiology
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| MURA | X-ray | 40,561 images | Normal/abnormal across 7 upper extremity types | Stanford ML Group |
| VerSe 2020 | CT | 374 scans | Vertebra segmentation and labeling | GitHub |
| RSNA Cervical Spine Fracture | CT | 3,000+ scans | Fracture detection and localization | Kaggle |
| KneeXray (OAI) | X-ray | 36,369 images | Kellgren-Lawrence osteoarthritis grading | NIH OAI |
| fastMRI | MRI (knee + brain) | 10,000+ volumes | Raw k-space data for accelerated MRI reconstruction | NYU fastMRI |
Nuclear Medicine and PET Datasets
| Dataset | Modality | Size | Annotations | Source |
|---|---|---|---|---|
| AutoPET | PET/CT | 1,014 studies | Whole-body tumor segmentation from FDG-PET/CT | Grand Challenge |
| HECKTOR | PET/CT | 882 patients | Head and neck tumor segmentation and outcome prediction | Grand Challenge |
Report Generation and Vision-Language Datasets
A rapidly growing category pairs radiology images with their associated reports, enabling AI systems that can generate or summarize radiological findings.
| Dataset | Size | Content | Key Features | Source |
|---|---|---|---|---|
| MIMIC-CXR Reports | 227,835 reports | Chest X-ray free-text reports | Largest radiology report dataset; paired with images | PhysioNet |
| IU X-Ray | 7,470 pairs | Chest X-ray images + reports | Indiana University dataset; frequently used for report generation | Open-i |
| RadNLI | 960 sentence pairs | Natural language inference for radiology | Entailment, contradiction, neutral labels for report sentences | PhysioNet |
Ultrasound Datasets
| Dataset | Focus | Size | Annotations | Source |
|---|---|---|---|---|
| BUSI (Breast Ultrasound Images) | Breast | 780 images | Normal, benign, malignant classification + segmentation masks | Cairo University |
| HC18 | Fetal head | 1,334 images | Head circumference measurement in 2D ultrasound | Grand Challenge |
| TN3K | Thyroid | 3,493 images | Thyroid nodule segmentation | GitHub |
| EchoNet-Dynamic | Cardiac | 10,030 videos | Ejection fraction + semantic segmentations | echonet.github.io |
Open Source Radiology AI Frameworks
To work effectively with these datasets, several open source frameworks have become essential tools in the radiology AI researcher’s toolkit:
- MONAI — Medical Open Network for AI: comprehensive PyTorch framework for medical imaging
- MONAI Model Zoo — Pretrained models for common medical imaging tasks
- fastMRI — Tools for accelerated MRI reconstruction
- Microsoft Health Intelligence — ML toolbox for medical imaging
- nnU-Net — Self-configuring framework for medical image segmentation
- 3D Slicer — Open source platform for medical image informatics
Getting Started
For newcomers to radiology AI, the path depends on your clinical focus. For chest imaging, CheXpert and MIMIC-CXR provide the scale needed for robust model development. For segmentation tasks, TotalSegmentator and the Medical Segmentation Decathlon offer comprehensive multi-organ benchmarks. For neuroradiology, BraTS remains the gold standard for tumor segmentation, while the Human Connectome Project provides unparalleled brain connectivity data.
Regardless of your focus area, we recommend using the MONAI framework and nnU-Net as starting points for model development — both handle much of the preprocessing, data loading, and training pipeline complexity that can slow down medical imaging research.
The open radiology dataset ecosystem is more robust than ever, and new datasets continue to emerge from major academic medical centers and government initiatives worldwide. Bookmark this page and check back regularly — we will keep it updated as new resources become available.

