Open Source Datasets for AI in Dermatology: A Complete Resource Guide

Dermatology has emerged as one of the most active frontiers for AI in healthcare, driven in large part by the visual nature of skin disease diagnosis. The field’s reliance on pattern recognition from images makes it a natural fit for deep learning — and the availability of open source datasets has been the catalyst for an explosion of research. From melanoma detection to rare disease classification, publicly accessible dermatology datasets are enabling researchers and developers to build systems that could one day match or exceed expert-level diagnostic accuracy.

This guide catalogs every major open source dermatology dataset available today, with direct links to source data and code repositories. Whether you’re training a skin lesion classifier, building a dermoscopic segmentation model, or exploring multimodal dermatology AI, this is your starting point.

The Landscape of Dermatology AI Data

Skin imaging datasets broadly fall into three categories: clinical photographs (taken with standard cameras in clinical settings), dermoscopic images (captured with dermatoscopes that use polarized light and magnification), and histopathological images (microscopy slides of skin biopsies). Each modality presents different challenges for AI systems, and the best models increasingly combine information across modalities.

A critical challenge in dermatology AI is skin tone diversity. Many early datasets were heavily skewed toward lighter skin tones, leading to models that performed poorly on darker skin. Recent initiatives have begun addressing this gap, and we highlight datasets that contribute to more equitable AI development.

Skin Lesion Classification Datasets

These datasets focus on categorizing skin lesions into diagnostic categories — the most common task in dermatology AI.

Dataset	Images	Classes	Image Type	Key Features	Source
ISIC Archive	150,000+	Multiple (varies)	Dermoscopic + Clinical	Largest public skin lesion archive; basis for annual challenges since 2016	isic-archive.com
HAM10000	10,015	7 diagnostic categories	Dermoscopic	Curated from two sites; includes actinic keratoses, basal cell carcinoma, benign keratosis, dermatofibroma, melanoma, nevi, vascular lesions	Harvard Dataverse
Fitzpatrick17k	16,577	114 conditions	Clinical photographs	Labeled with Fitzpatrick skin type (I-VI); addresses skin tone bias in dermatology AI	GitHub
PAD-UFES-20	2,298	6 skin lesion types	Clinical smartphone photos	Includes patient metadata (age, sex, body region); smartphone-captured for real-world performance	Mendeley Data
Derm7pt	2,000	Multiclass + 7-point checklist	Dermoscopic + Clinical pairs	Both dermoscopic and clinical images per lesion; 7-point checklist scoring for structured diagnosis	SFU
DermNet Dataset	23,000+	600+ conditions	Clinical photographs	Broadest condition coverage; images sourced from DermNet NZ	Kaggle
SD-198	6,584	198 skin disease categories	Clinical photographs	Fine-grained classification benchmark	GitHub
DDI (Diverse Dermatology Images)	656	78 conditions	Clinical photographs	Specifically curated for skin tone diversity; pathology-confirmed diagnoses	ddi-dataset.github.io

Dermoscopic Segmentation Datasets

Segmentation datasets provide pixel-level masks delineating lesion boundaries, enabling AI systems to precisely locate and measure skin lesions.

Dataset	Images	Annotation Type	Key Features	Source
ISIC 2018 Task 1	2,594	Lesion boundary segmentation masks	Part of ISIC Challenge; gold standard for lesion segmentation	ISIC Challenge
PH2	200	Lesion segmentation + dermoscopic structures	Expert annotations with asymmetry, border, color, dermoscopic structures	ADDI Project
DermIS/DermQuest	Varies	Clinical descriptions + segmentations	Historical atlas-style dataset	DermIS
ISIC 2017 Challenge	2,750	Segmentation + classification	Melanoma, seborrheic keratosis, benign nevi	ISIC Challenge

Skin Cancer Screening Datasets

Dataset	Images	Focus	Key Features	Source
BCN20000	19,424	8 diagnostic categories	Hospital Clinic Barcelona dataset; demographically rich metadata	arXiv (Paper)
MClass-D / MClass-ND	100 / 100	Melanoma vs. nevi	Benchmarking sets used in human-vs-AI studies	skinclass.de
SIIM-ISIC Melanoma Classification	33,126	Melanoma detection	Kaggle competition dataset with patient metadata; one of the largest melanoma-specific datasets	Kaggle

Specialized Dermatology Datasets

Dataset	Images	Focus	Key Features	Source
SkinCon	3,230	48 clinical concept annotations	Concept-based annotations for explainable AI in dermatology	skincon-dataset.github.io
Monkeypox Skin Lesion Dataset	2,000+	Monkeypox vs. similar conditions	Created during 2022 outbreak; includes measles, chickenpox, cowpox comparisons	GitHub
Wound Imaging	1,335	Chronic wound classification	Diabetic foot ulcers, venous ulcers, pressure injuries	GitHub
SCIN (Skin Condition Image Network)	10,000+	Crowd-sourced skin conditions	Google Health initiative; diverse skin tones; self-reported conditions	GitHub

Multimodal and Text-Image Datasets

The latest generation of dermatology datasets pair images with rich textual descriptions, enabling vision-language models and more sophisticated AI systems.

Dataset	Size	Modalities	Key Features	Source
SkinGPT-4 Training Data	52,929 image-text pairs	Dermoscopic images + diagnostic text	Used to train SkinGPT-4 vision-language model	GitHub
DermExpert	50,000+ pairs	Clinical images + expert descriptions	Expert-written descriptions for training diagnostic chatbots	GitHub

Addressing Bias: Skin Tone Diversity

One of the most important developments in dermatology AI has been the growing recognition that datasets must represent the full spectrum of human skin tones. Early datasets like HAM10000 were overwhelmingly composed of images from light-skinned individuals, leading to models that underperformed on darker skin. The Fitzpatrick17k and DDI datasets were explicitly created to address this gap, and the ISIC Archive has been actively expanding its diversity.

Researchers building dermatology AI systems should evaluate performance across Fitzpatrick skin types I through VI and report disaggregated metrics. This is not just a technical concern — it is an ethical imperative that directly impacts clinical equity.

Model Repositories and Pretrained Weights

Several research groups have released pretrained models alongside their datasets, enabling rapid experimentation and transfer learning:

Google Derm Foundation — Foundation model for dermatology trained on diverse clinical images
UCSD AI4H Dermatology Models — Open-source dermatology classification models
CHARM (Clinical Histopathology And Real-world Melanoma) — Melanoma classification from histopathology

Getting Started with Dermatology AI

For newcomers, we recommend beginning with HAM10000 for classification tasks or the ISIC 2018 dataset for segmentation. Both are well-documented, moderately sized, and have established baselines. The Fitzpatrick17k dataset is essential for anyone building systems intended for clinical deployment, as it enables fairness evaluation across skin tones.

For production-grade melanoma screening systems, the SIIM-ISIC competition dataset provides the scale and metadata richness needed for robust model development. And for researchers exploring multimodal approaches, the SkinGPT-4 training data offers a starting point for vision-language model development in dermatology.

As the field continues to evolve, we expect to see more datasets incorporating 3D skin imaging, total body photography, and longitudinal monitoring data. The foundation for equitable, effective dermatology AI starts with the data — and these open resources are making that foundation stronger every year.