Every Open Source Radiology Dataset You Need for AI Research in 2026

Radiology sits at the intersection of imaging technology and clinical decision-making, making it one of the most data-rich and AI-ready specialties in medicine. From chest X-rays to brain MRIs, from abdominal CTs to mammograms, the volume and variety of radiological imaging data is staggering. And thanks to a growing commitment to open science, an impressive collection of these datasets is now freely available to researchers worldwide.

This guide catalogs every major open source radiology dataset available for AI research, organized by anatomical region and imaging modality. Each entry includes direct links to download portals, GitHub repositories, and associated publications. This is designed to be a living reference for anyone building AI systems for radiology.

Why Radiology Leads in Open Data

Radiology was among the first medical specialties to embrace digital formats, with DICOM becoming the universal standard for medical imaging decades ago. This digital-native foundation, combined with the visual pattern recognition demands of the specialty, has made radiology the testing ground for medical AI. Large-scale NIH-funded projects like The Cancer Imaging Archive (TCIA) and mandates for data sharing in federally funded research have further accelerated dataset availability.

Chest Radiology Datasets

Chest imaging — both X-ray and CT — represents the largest category of open radiology data, driven by the global burden of pulmonary disease and the COVID-19 pandemic.

Dataset	Modality	Size	Annotations	Source
CheXpert	Chest X-ray	224,316 images	14 pathology labels with uncertainty modeling	Stanford ML Group
MIMIC-CXR-JPG	Chest X-ray	377,110 images	Free-text reports + CheXpert-style labels	PhysioNet
NIH ChestX-ray14	Chest X-ray	112,120 images	14 disease labels, bounding boxes for 880 images	NIH Clinical Center
VinDr-CXR	Chest X-ray	18,000 images	22 lesion categories with bounding boxes from 17 radiologists	PhysioNet
PadChest	Chest X-ray	160,868 images	174 radiographic findings, 19 differential diagnoses	BIMCV
RSNA Pneumonia Detection	Chest X-ray	30,000 images	Bounding boxes for pneumonia opacities	Kaggle
LUNA16	Chest CT	888 scans	Lung nodule locations from LIDC-IDRI	Grand Challenge
LIDC-IDRI	Chest CT	1,018 scans	Multi-reader nodule annotations with malignancy ratings	TCIA
NLST (National Lung Screening Trial)	Low-dose CT	75,000+ scans	Lung cancer screening outcomes	NCI CDAS

Neuroradiology Datasets

Brain imaging datasets support research in tumor detection, stroke assessment, neurodegenerative disease tracking, and normal brain development.

Dataset	Modality	Size	Annotations	Source
BraTS 2023/2024	Multi-parametric MRI	2,000+ cases	Glioma segmentation (enhancing, core, whole tumor, edema)	Synapse
ADNI	MRI, PET	2,000+ subjects	Longitudinal Alzheimer’s imaging with cognitive scores	ADNI
CQ500	Head CT	491 scans	Intracranial hemorrhage, mass effect, midline shift, fractures	qure.ai
RSNA Intracranial Hemorrhage	Head CT	25,000+ exams	5 hemorrhage subtypes + normal	Kaggle
ATLAS (Anatomical Tracings of Lesions After Stroke)	T1w MRI	1,271 scans	Manual stroke lesion tracings	NITRC
OpenNeuro	MRI, EEG, MEG	900+ datasets	BIDS-formatted neuroscience datasets	openneuro.org
HCP (Human Connectome Project)	MRI (structural, functional, diffusion)	1,200 subjects	High-resolution brain connectivity maps	humanconnectome.org

Abdominal Radiology Datasets

Dataset	Modality	Size	Annotations	Source
TotalSegmentator	CT	1,204 scans	117 anatomical structures	GitHub
AbdomenAtlas 1.1	CT	9,262 volumes	25 organs + tumors	GitHub
AMOS 2022	CT + MRI	500 CT + 100 MRI	15 abdominal organs	Grand Challenge
LiTS	CT	201 scans	Liver + liver tumor segmentation	CodaLab
KiTS23	CT	599 scans	Kidney + kidney tumor segmentation	GitHub
BTCV (Beyond the Cranial Vault)	CT	50 scans	13 abdominal organ segmentations	Synapse
WORD	CT	150 scans	16 abdominal organ segmentations	GitHub

Mammography and Breast Imaging

Dataset	Modality	Size	Annotations	Source
VinDr-Mammo	Mammography	5,000 exams (20,000 images)	BI-RADS assessment + findings with bounding boxes	PhysioNet
CBIS-DDSM	Mammography	2,620 scans	Mass and calcification annotations with pathology-confirmed labels	TCIA
INbreast	Full-field digital mammography	410 images	Contour annotations for masses, calcifications, and distortions	INESC Porto
RSNA Screening Mammography	Mammography	54,706 images	Cancer detection labels	Kaggle
Duke Breast Cancer MRI	Breast MRI (DCE)	922 patients	Pre-operative MRI with clinical and genomic data	TCIA

Musculoskeletal Radiology

Dataset	Modality	Size	Annotations	Source
MURA	X-ray	40,561 images	Normal/abnormal across 7 upper extremity types	Stanford ML Group
VerSe 2020	CT	374 scans	Vertebra segmentation and labeling	GitHub
RSNA Cervical Spine Fracture	CT	3,000+ scans	Fracture detection and localization	Kaggle
KneeXray (OAI)	X-ray	36,369 images	Kellgren-Lawrence osteoarthritis grading	NIH OAI
fastMRI	MRI (knee + brain)	10,000+ volumes	Raw k-space data for accelerated MRI reconstruction	NYU fastMRI

Nuclear Medicine and PET Datasets

Dataset	Modality	Size	Annotations	Source
AutoPET	PET/CT	1,014 studies	Whole-body tumor segmentation from FDG-PET/CT	Grand Challenge
HECKTOR	PET/CT	882 patients	Head and neck tumor segmentation and outcome prediction	Grand Challenge

Report Generation and Vision-Language Datasets

A rapidly growing category pairs radiology images with their associated reports, enabling AI systems that can generate or summarize radiological findings.

Dataset	Size	Content	Key Features	Source
MIMIC-CXR Reports	227,835 reports	Chest X-ray free-text reports	Largest radiology report dataset; paired with images	PhysioNet
IU X-Ray	7,470 pairs	Chest X-ray images + reports	Indiana University dataset; frequently used for report generation	Open-i
RadNLI	960 sentence pairs	Natural language inference for radiology	Entailment, contradiction, neutral labels for report sentences	PhysioNet

Ultrasound Datasets

Dataset	Focus	Size	Annotations	Source
BUSI (Breast Ultrasound Images)	Breast	780 images	Normal, benign, malignant classification + segmentation masks	Cairo University
HC18	Fetal head	1,334 images	Head circumference measurement in 2D ultrasound	Grand Challenge
TN3K	Thyroid	3,493 images	Thyroid nodule segmentation	GitHub
EchoNet-Dynamic	Cardiac	10,030 videos	Ejection fraction + semantic segmentations	echonet.github.io

Open Source Radiology AI Frameworks

To work effectively with these datasets, several open source frameworks have become essential tools in the radiology AI researcher’s toolkit:

MONAI — Medical Open Network for AI: comprehensive PyTorch framework for medical imaging
MONAI Model Zoo — Pretrained models for common medical imaging tasks
fastMRI — Tools for accelerated MRI reconstruction
Microsoft Health Intelligence — ML toolbox for medical imaging
nnU-Net — Self-configuring framework for medical image segmentation
3D Slicer — Open source platform for medical image informatics

Getting Started

For newcomers to radiology AI, the path depends on your clinical focus. For chest imaging, CheXpert and MIMIC-CXR provide the scale needed for robust model development. For segmentation tasks, TotalSegmentator and the Medical Segmentation Decathlon offer comprehensive multi-organ benchmarks. For neuroradiology, BraTS remains the gold standard for tumor segmentation, while the Human Connectome Project provides unparalleled brain connectivity data.

Regardless of your focus area, we recommend using the MONAI framework and nnU-Net as starting points for model development — both handle much of the preprocessing, data loading, and training pipeline complexity that can slow down medical imaging research.

The open radiology dataset ecosystem is more robust than ever, and new datasets continue to emerge from major academic medical centers and government initiatives worldwide. Bookmark this page and check back regularly — we will keep it updated as new resources become available.