← Text-to-Image

Benchmarks

Live arena rankings from human preference votes, established benchmarks, quantitative metrics, training datasets, and surveys.

⌘K

Image Generation Benchmarks 40 benchmarks

BenchmarkEvaluation FocusVenueYear
SpatialGenSpatial intelligence generation evaluation arXiv2026
ColorConceptProbabilistic color-concept T2I evaluation arXiv2026
GEBenchGUI environment generation benchmark arXiv2026
UReasonReasoning probing for T2I models arXiv2026
UEvalUnified multimodal generation evaluation arXiv2026
GenExamMultidisciplinary text-to-image examination arXiv2025
WISEWorld knowledge-informed semantic evaluation for T2I arXiv2025
DreamBench++Human-aligned personalized image generation ICLR2025
T2I-CompBench++Enhanced compositional text-to-image evaluation TPAMI2025
GenEval 2T2I generation drift detection benchmark arXiv2025
TIIF-BenchT2I instruction following benchmark arXiv2025
R2I-BenchCommonsense reasoning T2I evaluation arXiv2025
SciScoreScientific illustration T2I evaluation arXiv2025
PHYSBENCHPhysical domain T2I evaluation arXiv2025
T2I-ReasonIdiom and entity reasoning T2I evaluation arXiv2025
UniGenBenchUnified 20-subtheme generation evaluation arXiv2025
T2I-ConBenchContinual learning retention T2I evaluation arXiv2025
OneIG-BenchUnified anime and portrait generation evaluation arXiv2025
LongBenchLong instruction multi-type T2I evaluation arXiv2025
PRISM-BenchMillion-scale T2I reasoning evaluation arXiv2025
T2I-CoReBenchCore reasoning T2I evaluation arXiv2025
LongT2IBenchGraph-structured long text T2I evaluation arXiv2025
GIR-BenchGeneration-informed reasoning quality benchmark arXiv2025
MagicMirrorFine-grained artifact assessment benchmark arXiv2025
Culture in AISocial activity and cultural T2I evaluation arXiv2025
EnvisionCausal world insight T2I evaluation arXiv2025
GenAI-BenchCompositional text-to-visual generation CVPR2024
DPG-BenchLong-prompt dense generation evaluation arXiv2024
PhyBenchPhysical mechanics T2I evaluation arXiv2024
Commonsense T2IVisual commonsense T2I evaluation arXiv2024
ConceptMixConcept categorization T2I evaluation arXiv2024
T2I-FactualFactual knowledge T2I evaluation arXiv2024
GenEvalObject-focused T2I alignment framework NeurIPS2023
TIFAT2I faithfulness via question answering ICCV2023
HEIMHolistic evaluation of text-to-image models NeurIPS2023
HPS v2Human preference score correlation benchmark arXiv2023
WinogroundContrastive compositional T2I evaluation CVPR2023
DrawBenchPhotorealistic T2I quality assessment prompts NeurIPS2022
PartiPromptsContent-rich T2I evaluation prompts TMLR2022
VISORSpatial relation T2I evaluation arXiv2022

Image Editing Benchmarks 40 benchmarks

BenchmarkEvaluation FocusVenueYear
PlanVizPlanning-oriented editing evaluation arXiv2026
LocateEditLocalization instruction editing benchmark arXiv2026
VIBEVisual instruction-based editing evaluation arXiv2026
Interaction EditMLLM-based object interaction editing benchmark arXiv2026
World-Shape360° panoramic editing consistency evaluation arXiv2026
VDE BenchVisual document editing evaluation arXiv2026
HYPE-EDITReliability and robustness editing evaluation arXiv2026
EDIRFine-grained composed image editing evaluation arXiv2026
UniPic-3.0Multi-image composition editing benchmark arXiv2026
UM-TextVisual text and OCR editing benchmark arXiv2026
I2EInteractive image-to-edit benchmark arXiv2026
MotionEditMotion-centered editing evaluation arXiv2026
KRIS-BenchNext-level intelligent image editing assessment NeurIPS2025
CompBenchComplex instruction editing evaluation arXiv2025
ComplexBenchMulti-step chain robustness editing benchmark arXiv2025
Complex-EditComplexity-aware editing evaluation arXiv2025
GEdit-BenchRealistic use-case editing evaluation arXiv2025
GPT-ImgEditClosed-model editing quality evaluation arXiv2025
IE-BenchHuman-aligned MOS editing evaluation arXiv2025
ImgEdit-BenchUnified instruction-based editing evaluation arXiv2025
MCIEMLLM-driven complex instruction editing benchmark arXiv2025
MMKE-BenchKnowledge entity editing evaluation arXiv2025
PICABenchPhysical realistic plausibility editing evaluation arXiv2025
PPTArenaAgentic PowerPoint editing evaluation arXiv2025
RefEditReference-guided editing evaluation arXiv2025
SpotEditVisually-guided editing benchmark arXiv2025
UniREditBenchReasoning-based editing evaluation arXiv2025
WEAVEInterleaved in-context editing evaluation arXiv2025
WiseEditCognition and creativity editing evaluation arXiv2025
EditScoreReward model fidelity editing metric arXiv2025
EdiVal-AgentAgentic multi-turn editing evaluation arXiv2025
AnyEditUnified high-quality image editing evaluation CVPR2024
I2EBench16-dimensional comprehensive editing evaluation arXiv2024
GIE-BenchGrounded image editing evaluation arXiv2024
FSMI-EditLocalized mask-guided editing evaluation arXiv2024
EditValAutomated edit success evaluation arXiv2023
Emu Edit Bench7-task unified editing precision benchmark arXiv2023
PIE-BenchEdit fidelity inversion evaluation arXiv2023
MagicBrushHuman-annotated editing evaluation NeurIPS2023
EditBenchObject rendering and inpainting benchmark arXiv2022

Established Quantitative Metrics

FIDFréchet Inception Distance — distributional image quality
ISInception Score — generation quality & diversity
CLIP ScoreText-image semantic alignment via CLIP
VQAScoreVQA-based compositional faithfulness
LPIPSLearned Perceptual Image Patch Similarity
HPSv2Human Preference Score v2
DreamSimHuman visual similarity via synthetic data
ImageRewardLearned human preference for T2I
TIFAT2I faithfulness via question answering
DSGDavidsonian Scene Graph evaluation
R-FIDReconstruction FID for tokenizers
SSIMStructural Similarity Index

T2I Training & Editing Datasets 12 datasets

DatasetScaleTypeVenueYear
LAION-Aesthetics 120MT2I (Aesthetic)NeurIPS2022
PixelProse16MT2I (Dense Captions) arXiv2024
PD12M12MT2I (Public Domain) arXiv2024
CC-12M12MT2I (Conceptual) CVPR2021
SAM11MT2I (Segmentation) ICCV2023
ByteMorph-6M6MEditing (Non-rigid) arXiv2025
TextAtlas5M5MT2I (Dense Text) arXiv2025
UltraEdit4MEditing (Fine-grained) NeurIPS2024
AnyEdit2.5MEditing (Unified) CVPR2024
ImgEdit1.2MEditing (Unified) arXiv2025
InstructPix2Pix313KEditing (Instructional) CVPR2022
MagicBrush10KEditing (Annotated) NeurIPS2023

T2I Surveys & Foundational References 2 surveys

TitleDomainVenueYear
Vision + Language Applications: A SurveyT2I / V&L CVPRW2023
Holistic Evaluation of Text-To-Image Models EvaluationNeurIPS2023