Dermatology has emerged as one of the most active frontiers for AI in healthcare, driven in large part by the visual nature of skin disease diagnosis. The field’s reliance on pattern recognition from images makes it a natural fit for deep learning — and the availability of open source datasets has been the catalyst for an explosion of research. From melanoma detection to rare disease classification, publicly accessible dermatology datasets are enabling researchers and developers to build systems that could one day match or exceed expert-level diagnostic accuracy.
This guide catalogs every major open source dermatology dataset available today, with direct links to source data and code repositories. Whether you’re training a skin lesion classifier, building a dermoscopic segmentation model, or exploring multimodal dermatology AI, this is your starting point.
The Landscape of Dermatology AI Data
Skin imaging datasets broadly fall into three categories: clinical photographs (taken with standard cameras in clinical settings), dermoscopic images (captured with dermatoscopes that use polarized light and magnification), and histopathological images (microscopy slides of skin biopsies). Each modality presents different challenges for AI systems, and the best models increasingly combine information across modalities.
A critical challenge in dermatology AI is skin tone diversity. Many early datasets were heavily skewed toward lighter skin tones, leading to models that performed poorly on darker skin. Recent initiatives have begun addressing this gap, and we highlight datasets that contribute to more equitable AI development.
Skin Lesion Classification Datasets
These datasets focus on categorizing skin lesions into diagnostic categories — the most common task in dermatology AI.
| Dataset | Images | Classes | Image Type | Key Features | Source |
|---|---|---|---|---|---|
| ISIC Archive | 150,000+ | Multiple (varies) | Dermoscopic + Clinical | Largest public skin lesion archive; basis for annual challenges since 2016 | isic-archive.com |
| HAM10000 | 10,015 | 7 diagnostic categories | Dermoscopic | Curated from two sites; includes actinic keratoses, basal cell carcinoma, benign keratosis, dermatofibroma, melanoma, nevi, vascular lesions | Harvard Dataverse |
| Fitzpatrick17k | 16,577 | 114 conditions | Clinical photographs | Labeled with Fitzpatrick skin type (I-VI); addresses skin tone bias in dermatology AI | GitHub |
| PAD-UFES-20 | 2,298 | 6 skin lesion types | Clinical smartphone photos | Includes patient metadata (age, sex, body region); smartphone-captured for real-world performance | Mendeley Data |
| Derm7pt | 2,000 | Multiclass + 7-point checklist | Dermoscopic + Clinical pairs | Both dermoscopic and clinical images per lesion; 7-point checklist scoring for structured diagnosis | SFU |
| DermNet Dataset | 23,000+ | 600+ conditions | Clinical photographs | Broadest condition coverage; images sourced from DermNet NZ | Kaggle |
| SD-198 | 6,584 | 198 skin disease categories | Clinical photographs | Fine-grained classification benchmark | GitHub |
| DDI (Diverse Dermatology Images) | 656 | 78 conditions | Clinical photographs | Specifically curated for skin tone diversity; pathology-confirmed diagnoses | ddi-dataset.github.io |
Dermoscopic Segmentation Datasets
Segmentation datasets provide pixel-level masks delineating lesion boundaries, enabling AI systems to precisely locate and measure skin lesions.
| Dataset | Images | Annotation Type | Key Features | Source |
|---|---|---|---|---|
| ISIC 2018 Task 1 | 2,594 | Lesion boundary segmentation masks | Part of ISIC Challenge; gold standard for lesion segmentation | ISIC Challenge |
| PH2 | 200 | Lesion segmentation + dermoscopic structures | Expert annotations with asymmetry, border, color, dermoscopic structures | ADDI Project |
| DermIS/DermQuest | Varies | Clinical descriptions + segmentations | Historical atlas-style dataset | DermIS |
| ISIC 2017 Challenge | 2,750 | Segmentation + classification | Melanoma, seborrheic keratosis, benign nevi | ISIC Challenge |
Skin Cancer Screening Datasets
| Dataset | Images | Focus | Key Features | Source |
|---|---|---|---|---|
| BCN20000 | 19,424 | 8 diagnostic categories | Hospital Clinic Barcelona dataset; demographically rich metadata | arXiv (Paper) |
| MClass-D / MClass-ND | 100 / 100 | Melanoma vs. nevi | Benchmarking sets used in human-vs-AI studies | skinclass.de |
| SIIM-ISIC Melanoma Classification | 33,126 | Melanoma detection | Kaggle competition dataset with patient metadata; one of the largest melanoma-specific datasets | Kaggle |
Specialized Dermatology Datasets
| Dataset | Images | Focus | Key Features | Source |
|---|---|---|---|---|
| SkinCon | 3,230 | 48 clinical concept annotations | Concept-based annotations for explainable AI in dermatology | skincon-dataset.github.io |
| Monkeypox Skin Lesion Dataset | 2,000+ | Monkeypox vs. similar conditions | Created during 2022 outbreak; includes measles, chickenpox, cowpox comparisons | GitHub |
| Wound Imaging | 1,335 | Chronic wound classification | Diabetic foot ulcers, venous ulcers, pressure injuries | GitHub |
| SCIN (Skin Condition Image Network) | 10,000+ | Crowd-sourced skin conditions | Google Health initiative; diverse skin tones; self-reported conditions | GitHub |
Multimodal and Text-Image Datasets
The latest generation of dermatology datasets pair images with rich textual descriptions, enabling vision-language models and more sophisticated AI systems.
| Dataset | Size | Modalities | Key Features | Source |
|---|---|---|---|---|
| SkinGPT-4 Training Data | 52,929 image-text pairs | Dermoscopic images + diagnostic text | Used to train SkinGPT-4 vision-language model | GitHub |
| DermExpert | 50,000+ pairs | Clinical images + expert descriptions | Expert-written descriptions for training diagnostic chatbots | GitHub |
Addressing Bias: Skin Tone Diversity
One of the most important developments in dermatology AI has been the growing recognition that datasets must represent the full spectrum of human skin tones. Early datasets like HAM10000 were overwhelmingly composed of images from light-skinned individuals, leading to models that underperformed on darker skin. The Fitzpatrick17k and DDI datasets were explicitly created to address this gap, and the ISIC Archive has been actively expanding its diversity.
Researchers building dermatology AI systems should evaluate performance across Fitzpatrick skin types I through VI and report disaggregated metrics. This is not just a technical concern — it is an ethical imperative that directly impacts clinical equity.
Model Repositories and Pretrained Weights
Several research groups have released pretrained models alongside their datasets, enabling rapid experimentation and transfer learning:
- Google Derm Foundation — Foundation model for dermatology trained on diverse clinical images
- UCSD AI4H Dermatology Models — Open-source dermatology classification models
- CHARM (Clinical Histopathology And Real-world Melanoma) — Melanoma classification from histopathology
Getting Started with Dermatology AI
For newcomers, we recommend beginning with HAM10000 for classification tasks or the ISIC 2018 dataset for segmentation. Both are well-documented, moderately sized, and have established baselines. The Fitzpatrick17k dataset is essential for anyone building systems intended for clinical deployment, as it enables fairness evaluation across skin tones.
For production-grade melanoma screening systems, the SIIM-ISIC competition dataset provides the scale and metadata richness needed for robust model development. And for researchers exploring multimodal approaches, the SkinGPT-4 training data offers a starting point for vision-language model development in dermatology.
As the field continues to evolve, we expect to see more datasets incorporating 3D skin imaging, total body photography, and longitudinal monitoring data. The foundation for equitable, effective dermatology AI starts with the data — and these open resources are making that foundation stronger every year.

