← Unified Models

Datasets

Training corpora, text-to-image datasets, image editing datasets, and interleaved image-text data for unified multimodal models.

⌘K

Comprehensive datasets for unified multimodal models — curated from Awesome-Unified-Multimodal-Models. Covers training corpora, text-to-image datasets, image editing datasets, and interleaved image-text data.

Multimodal Understanding Datasets 8 datasets

DatasetScaleDescriptionVenueYear
Honey-Data-15M15MHigh-quality corpus for advanced fully open MLLMs arXiv2025
Infinity-MM40MScaling multimodal performance with instruction data arXiv2024
LLaVA-OneVision4.8MEasy visual task transfer TMLR2024
Cambrian-10M10MVision-centric exploration of multimodal LLMs NeurIPS2024
ShareGPT4V100KBetter captions for large multi-modal models ECCV2023
CapsFusion-120M120MRethinking image-text data at scale CVPR2023
DataComp1.4BNext-generation multimodal dataset search NeurIPS2023
LAION-5B5.9BOpen large-scale multi-modal dataset NeurIPS2022

Text-to-Image Datasets 10 datasets

DatasetScaleDescriptionVenueYear
FLUX-Reason-6M6MMillion-scale text-to-image reasoning dataset arXiv2025
ShareGPT-4o-Image45KAligning multimodal models with GPT-4o-level generation arXiv2025
BLIP3o-60k60KUnified multimodal models architecture training dataset arXiv2025
TextAtlas5M5MLarge-scale dataset for dense text image generation arXiv2025
PD12M12MHighly aesthetic image-text dataset with novel governance arXiv2024
PixelProse16MLarge dataset of dense image captions arXiv2024
JourneyDB4MBenchmark for generative image understanding NeurIPS2023
Mario-10M10MTextDiffuser dataset for text rendering NeurIPS2023
SAM11MSegment Anything dataset ICCV2023
LAION-Aesthetics120MAesthetic subset of LAION-5B NeurIPS2022

Image Editing Datasets 8 datasets

DatasetScaleDescriptionVenueYear
X2Edit3.7MArbitrary-instruction image editing dataset arXiv2025
ByteMorph-6M6MInstruction-guided image editing with non-rigid motions arXiv2025
ImgEdit1.2MUnified image editing dataset and benchmark arXiv2025
AnyEdit2.5MMastering unified high-quality image editing CVPR2024
OmniEdit1.2MBuilding image editing generalist models ICLR2024
UltraEdit4MInstruction-based fine-grained image editing at scale NeurIPS2024
HQ-Edit197KHigh-quality dataset for instruction-based image editing arXiv2024
InstructP2P313KLearning to follow image editing instructions CVPR2022

Interleaved Image-Text Corpora 4 datasets

DatasetScaleDescriptionVenueYear
OmniCorpus8BUnified multimodal corpus of 10B-level images interleaved with text ICLR2024
CoMM227KCoherent interleaved image-text dataset for multimodal understanding CVPR2024
OBELICS141MOpen web-scale filtered interleaved image-text documents NeurIPS2023
Multimodal C4101.2MOpen, billion-scale corpus of images interleaved with text NeurIPS2023

Unified Models Survey 1 survey

TitleDomainVenueYear
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities UnifiedarXiv2025