Every Open Source Radiology Dataset You Need for AI Research in 2026

·

Radiology sits at the intersection of imaging technology and clinical decision-making, making it one of the most data-rich and AI-ready specialties in medicine. From chest X-rays to brain MRIs, from abdominal CTs to mammograms, the volume and variety of radiological imaging data is staggering. And thanks to a growing commitment to open science, an impressive collection of these datasets is now freely available to researchers worldwide.

This guide catalogs every major open source radiology dataset available for AI research, organized by anatomical region and imaging modality. Each entry includes direct links to download portals, GitHub repositories, and associated publications. This is designed to be a living reference for anyone building AI systems for radiology.

Why Radiology Leads in Open Data

Radiology was among the first medical specialties to embrace digital formats, with DICOM becoming the universal standard for medical imaging decades ago. This digital-native foundation, combined with the visual pattern recognition demands of the specialty, has made radiology the testing ground for medical AI. Large-scale NIH-funded projects like The Cancer Imaging Archive (TCIA) and mandates for data sharing in federally funded research have further accelerated dataset availability.

Chest Radiology Datasets

Chest imaging — both X-ray and CT — represents the largest category of open radiology data, driven by the global burden of pulmonary disease and the COVID-19 pandemic.

Dataset Modality Size Annotations Source
CheXpert Chest X-ray 224,316 images 14 pathology labels with uncertainty modeling Stanford ML Group
MIMIC-CXR-JPG Chest X-ray 377,110 images Free-text reports + CheXpert-style labels PhysioNet
NIH ChestX-ray14 Chest X-ray 112,120 images 14 disease labels, bounding boxes for 880 images NIH Clinical Center
VinDr-CXR Chest X-ray 18,000 images 22 lesion categories with bounding boxes from 17 radiologists PhysioNet
PadChest Chest X-ray 160,868 images 174 radiographic findings, 19 differential diagnoses BIMCV
RSNA Pneumonia Detection Chest X-ray 30,000 images Bounding boxes for pneumonia opacities Kaggle
LUNA16 Chest CT 888 scans Lung nodule locations from LIDC-IDRI Grand Challenge
LIDC-IDRI Chest CT 1,018 scans Multi-reader nodule annotations with malignancy ratings TCIA
NLST (National Lung Screening Trial) Low-dose CT 75,000+ scans Lung cancer screening outcomes NCI CDAS

Neuroradiology Datasets

Brain imaging datasets support research in tumor detection, stroke assessment, neurodegenerative disease tracking, and normal brain development.

Dataset Modality Size Annotations Source
BraTS 2023/2024 Multi-parametric MRI 2,000+ cases Glioma segmentation (enhancing, core, whole tumor, edema) Synapse
ADNI MRI, PET 2,000+ subjects Longitudinal Alzheimer’s imaging with cognitive scores ADNI
CQ500 Head CT 491 scans Intracranial hemorrhage, mass effect, midline shift, fractures qure.ai
RSNA Intracranial Hemorrhage Head CT 25,000+ exams 5 hemorrhage subtypes + normal Kaggle
ATLAS (Anatomical Tracings of Lesions After Stroke) T1w MRI 1,271 scans Manual stroke lesion tracings NITRC
OpenNeuro MRI, EEG, MEG 900+ datasets BIDS-formatted neuroscience datasets openneuro.org
HCP (Human Connectome Project) MRI (structural, functional, diffusion) 1,200 subjects High-resolution brain connectivity maps humanconnectome.org

Abdominal Radiology Datasets

Dataset Modality Size Annotations Source
TotalSegmentator CT 1,204 scans 117 anatomical structures GitHub
AbdomenAtlas 1.1 CT 9,262 volumes 25 organs + tumors GitHub
AMOS 2022 CT + MRI 500 CT + 100 MRI 15 abdominal organs Grand Challenge
LiTS CT 201 scans Liver + liver tumor segmentation CodaLab
KiTS23 CT 599 scans Kidney + kidney tumor segmentation GitHub
BTCV (Beyond the Cranial Vault) CT 50 scans 13 abdominal organ segmentations Synapse
WORD CT 150 scans 16 abdominal organ segmentations GitHub

Mammography and Breast Imaging

Dataset Modality Size Annotations Source
VinDr-Mammo Mammography 5,000 exams (20,000 images) BI-RADS assessment + findings with bounding boxes PhysioNet
CBIS-DDSM Mammography 2,620 scans Mass and calcification annotations with pathology-confirmed labels TCIA
INbreast Full-field digital mammography 410 images Contour annotations for masses, calcifications, and distortions INESC Porto
RSNA Screening Mammography Mammography 54,706 images Cancer detection labels Kaggle
Duke Breast Cancer MRI Breast MRI (DCE) 922 patients Pre-operative MRI with clinical and genomic data TCIA

Musculoskeletal Radiology

Dataset Modality Size Annotations Source
MURA X-ray 40,561 images Normal/abnormal across 7 upper extremity types Stanford ML Group
VerSe 2020 CT 374 scans Vertebra segmentation and labeling GitHub
RSNA Cervical Spine Fracture CT 3,000+ scans Fracture detection and localization Kaggle
KneeXray (OAI) X-ray 36,369 images Kellgren-Lawrence osteoarthritis grading NIH OAI
fastMRI MRI (knee + brain) 10,000+ volumes Raw k-space data for accelerated MRI reconstruction NYU fastMRI

Nuclear Medicine and PET Datasets

Dataset Modality Size Annotations Source
AutoPET PET/CT 1,014 studies Whole-body tumor segmentation from FDG-PET/CT Grand Challenge
HECKTOR PET/CT 882 patients Head and neck tumor segmentation and outcome prediction Grand Challenge

Report Generation and Vision-Language Datasets

A rapidly growing category pairs radiology images with their associated reports, enabling AI systems that can generate or summarize radiological findings.

Dataset Size Content Key Features Source
MIMIC-CXR Reports 227,835 reports Chest X-ray free-text reports Largest radiology report dataset; paired with images PhysioNet
IU X-Ray 7,470 pairs Chest X-ray images + reports Indiana University dataset; frequently used for report generation Open-i
RadNLI 960 sentence pairs Natural language inference for radiology Entailment, contradiction, neutral labels for report sentences PhysioNet

Ultrasound Datasets

Dataset Focus Size Annotations Source
BUSI (Breast Ultrasound Images) Breast 780 images Normal, benign, malignant classification + segmentation masks Cairo University
HC18 Fetal head 1,334 images Head circumference measurement in 2D ultrasound Grand Challenge
TN3K Thyroid 3,493 images Thyroid nodule segmentation GitHub
EchoNet-Dynamic Cardiac 10,030 videos Ejection fraction + semantic segmentations echonet.github.io

Open Source Radiology AI Frameworks

To work effectively with these datasets, several open source frameworks have become essential tools in the radiology AI researcher’s toolkit:

  • MONAI — Medical Open Network for AI: comprehensive PyTorch framework for medical imaging
  • MONAI Model Zoo — Pretrained models for common medical imaging tasks
  • fastMRI — Tools for accelerated MRI reconstruction
  • Microsoft Health Intelligence — ML toolbox for medical imaging
  • nnU-Net — Self-configuring framework for medical image segmentation
  • 3D Slicer — Open source platform for medical image informatics

Getting Started

For newcomers to radiology AI, the path depends on your clinical focus. For chest imaging, CheXpert and MIMIC-CXR provide the scale needed for robust model development. For segmentation tasks, TotalSegmentator and the Medical Segmentation Decathlon offer comprehensive multi-organ benchmarks. For neuroradiology, BraTS remains the gold standard for tumor segmentation, while the Human Connectome Project provides unparalleled brain connectivity data.

Regardless of your focus area, we recommend using the MONAI framework and nnU-Net as starting points for model development — both handle much of the preprocessing, data loading, and training pipeline complexity that can slow down medical imaging research.

The open radiology dataset ecosystem is more robust than ever, and new datasets continue to emerge from major academic medical centers and government initiatives worldwide. Bookmark this page and check back regularly — we will keep it updated as new resources become available.