AuraFlow: Open-Source Flow-Based T2I Generation Model
Fal.ai
2024
Meissonic
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
arXiv
2024
OmniGen
OmniGen: Unified Image Generation
arXiv
2024
Lumina-Next
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
arXiv
2024
HiDiffusion
HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
arXiv
2025
CogView4
CogView4: 16K-Resolution Image Generation with Relay Diffusion
Zhipu AI
2025
Seedream 3.0
Seedream 3.0: Scaling Up Diffusion Transformers
ByteDance
2025
GenExam
A Multidisciplinary Text-to-Image Exam
arXiv
2025
RefVNLI
Towards Scalable Evaluation of Subject-driven Text-to-image Generation
arXiv
2025
GPT-4o Image Study
An Empirical Study of GPT-4o Image Generation Capabilities
arXiv
2025
Imagen 3
Imagen 3
Google DeepMind
2024
PixArt-α
Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
ICLR
2024
PixArt-Σ
Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
arXiv
2024
PixArt-δ
Fast and Controllable Image Generation with Latent Consistency Models
arXiv
2024
SDXL-Lightning
Progressive Adversarial Diffusion Distillation
arXiv
2024
Kolors
Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
Kuaishou
2024
MARS
Mixture of Auto-Regressive Models for Fine-grained Text-to-Image Synthesis
arXiv
2024
Kandinsky 3
Text-to-Image Synthesis for Multifunctional Generative Framework
EMNLP
2024
RealCompo
Dynamic Equilibrium between Realism and Compositionality Improves T2I Diffusion Models
arXiv
2024
ECLIPSE
A Resource-Efficient Text-to-Image Prior for Image Generations
CVPR
2024
Ranni
Taming Text-to-Image Diffusion for Accurate Instruction Following
CVPR
2024
DiffusionGPT
LLM-Driven Text-to-Image Generation System
arXiv
2024
Playground v2.5
Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
arXiv
2024
Dimba
Transformer-Mamba Diffusion Models
arXiv
2024
SELMA
Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
arXiv
2024
RealCustom
Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
CVPR
2024
CoMat
Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
arXiv
2024
TextCraftor
Your Text Encoder Can be Image Quality Controller
arXiv
2024
AutoStudio
Crafting Consistent Subjects in Multi-turn Interactive Image Generation
arXiv
2024
TheaterGen
Character Management with LLM for Consistent Multi-turn Image Generation
arXiv
2024
Flow Generator Matching
Flow Generator Matching
arXiv
2024
Lumina-T2X
Transforming Text into Any Modality via Flow-based Large DiT
arXiv
2024
4M-21
An Any-to-Any Vision Model for Tens of Tasks and Modalities
arXiv
2024
Latent Diffusion & Transformer Era (2023) 30+ papers
Model
Full Title
Venue
Year
ControlNet
Adding Conditional Control to Text-to-Image Diffusion Models
ICCV
2023
GLIGEN
Open-Set Grounded Text-to-Image Generation
CVPR
2023
Attend-and-Excite
Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
arXiv
2023
GALIP
Generative Adversarial CLIPs for Text-to-Image Synthesis
CVPR
2023
Muse
Text-To-Image Generation via Masked Generative Transformers
arXiv
2023
StyleDrop
Text-to-Image Generation in Any Style
arXiv
2023
Prompt-Free Diffusion
Taking "Text" out of Text-to-Image Diffusion Models
arXiv
2023
Visual ChatGPT
Talking, Drawing and Editing with Visual Foundation Models
arXiv
2023
Kandinsky
An Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
arXiv
2023
Pick-a-Pic
An Open Dataset of User Preferences for Text-to-Image Generation
arXiv
2023
eDiff-I
Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
arXiv
2023
Blended Latent Diffusion
Blended Latent Diffusion
SIGGRAPH
2023
The Chosen One
Consistent Characters in Text-to-Image Diffusion Models
arXiv
2023
UFOGen
You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
arXiv
2023
BoxDiff
Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
ICCV
2023
ITI-GEN
Inclusive Text-to-Image Generation
ICCV
2023
Mini-DALLE3
Interactive Text to Image by Prompting Large Language Models
arXiv
2023
T2I-CompBench
A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
arXiv
2023
DiffBlender
Scalable and Composable Multimodal Text-to-Image Diffusion Models
arXiv
2023
ElasticDiffusion
Training-free Arbitrary Size Image Generation
arXiv
2023
Multi-Concept Customization
Multi-Concept Customization of Text-to-Image Diffusion
CVPR
2023
BLIP-Diffusion
Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
arXiv
2023
Universal Guidance
Universal Guidance for Diffusion Models
arXiv
2023
AltCLIP / AltDiffusion
Altering the Language Encoder in CLIP for Extended Language Capabilities
ACL Findings
2023
Expressive Rich Text
Expressive Text-to-Image Generation with Rich Text
ICCV
2023
Scaling up GANs
Scaling up GANs for Text-to-Image Synthesis
CVPR
2023
CoDi-2
In-Context, Interleaved, and Interactive Any-to-Any Generation
arXiv
2023
Detector Guidance
Detector Guidance for Multi-Object Text-to-Image Generation
arXiv
2023
A-STAR
Test-time Attention Segregation and Retention for Text-to-image Synthesis
arXiv
2023
Training-Free Structured Diffusion
Training-Free Structured Diffusion Guidance for Compositional T2I Synthesis
ICLR
2023
GAN & Early Diffusion Era (2020–2022) 15+ papers
Model
Full Title
Venue
Year
Stable Diffusion (LDM)
High-Resolution Image Synthesis with Latent Diffusion Models
CVPR
2022
Imagen
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
NeurIPS
2022
DALL·E 2
Hierarchical Text-Conditional Image Generation with CLIP Latents
arXiv
2022
Parti
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
TMLR
2022
OFA (Unified-IO)
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
arXiv
2022
Versatile Diffusion
Text, Images and Variations All in One Diffusion Model
arXiv
2022
Frido
Feature Pyramid Diffusion for Complex Scene Image Synthesis
arXiv
2022
NUWA-Infinity
Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
arXiv
2022
NÜWA
Visual Synthesis Pre-training for Neural visUal World creAtion
ECCV
2022
DALL·E
Zero-Shot Text-to-Image Generation
arXiv
2021
L-Verse
Bidirectional Generation Between Image and Text
arXiv
2021
ERNIE-ViLG
Unified Generative Pre-training for Bidirectional Vision-Language Generation (10B parameters)
arXiv
2021
M6-UFC
Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
NeurIPS
2021
ManiGAN
Text-Guided Image Manipulation
CVPR
2020
TAGAN
Text-adaptive Generative Adversarial Networks: Manipulating Images with Natural Language
NeurIPS
2018
Text-to-Face Synthesis 22+ papers
A specialized subfield dedicated to generating and manipulating human facial imagery from textual descriptions,
encompassing 2D face synthesis, 3D avatar generation, and attribute-level control.
Model
Full Title
Venue
Year
PreciseControl
Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
ECCV
2024
CosmicMan
A Text-to-Image Foundation Model for Humans
CVPR
2024
15M Facial Dataset
15M Multimodal Facial Image-Text Dataset
arXiv
2024
Portrait3D
Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior
arXiv
2024
Fast T2-3D Face
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping
ICML
2024
Celeb Basis
Inserting Anybody in Diffusion Models via Celeb Basis
NeurIPS
2023
DreamFace
Progressive Generation of Animatable 3D Faces under Text Guidance
SIGGRAPH
2023
Collaborative Diffusion
Multi-Modal Face Generation and Editing
CVPR
2023
High-Fidelity 3D Face
High-Fidelity 3D Face Generation from Natural Language Descriptions
CVPR
2023
Mukh-Oboyob
Stable Diffusion and BanglaBERT enhanced Bangla Text-to-Face Synthesis
IJACSA
2023
clip2latent
Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP
BMVC
2022
AnyFace
Free-style Text-to-Face Synthesis and Manipulation
CVPR
2022
StyleT2I
Toward Compositional and High-Fidelity Text-to-Image Synthesis
CVPR
2022
CMAFGAN
A Cross-Modal Attention Fusion based GAN for Attribute Word-to-Face Synthesis
Knowledge-Based Systems
2022
DualG-GAN
A Dual-channel Generator based GAN for Text-to-Face Synthesis
Neural Networks
2022
ManiCLIP
Multi-Attribute Face Manipulation from Text
arXiv
2022
TextFace
Text-to-Style Mapping based Face Generation and Manipulation
IEEE TNSE
2022
TediGAN
Text-Guided Diverse Image Generation and Manipulation
CVPR
2021
FG-GAN
Generative Adversarial Network for Text-to-Face Synthesis with Pretrained BERT
FG
2021
Multi-caption T2F
Multi-caption Text-to-Face Synthesis: Dataset and Algorithm
ACMMM
2021
Faces à la Carte
Text-to-Face Generation via Attribute Disentanglement
WACV
2021
FTGAN
A Fully-trained Generative Adversarial Networks for Text to Face Generation