Controllable & Compositional Generation

Fine-grained spatial, structural, and attribute-level control over the text-to-image generation process.

⌘K

Methods enabling fine-grained spatial, structural, or attribute-level control over the text-to-image generation process, including layout-guided, pose-guided, and grounded generation approaches.

Controllable & Compositional Generation 30+ papers

Model	Full Title	Venue	Year
FLUX.1 Tools	FLUX.1 Fill, Redux, Canny, Depth: Controllable Generation Toolkit	BFL	2025
ControlNeXt	ControlNeXt: Powerful and Efficient Control for Image and Video Generation	arXiv	2025
HiCo	HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation	CVPR	2025
ACE	ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer	arXiv	2025
ControlAR	ControlAR: Controllable Image Generation with Autoregressive Models	arXiv	2025
OmniControl	OmniControl: Control Any Joint at Any Time for Human Motion Generation	ICLR	2025
ControlNet++	ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback	ECCV	2024
OmniGen	OmniGen: Unified Image Generation	arXiv	2024
Ctrl-X	Controlling Structure and Appearance for T2I Generation Without Guidance	arXiv	2024
CreatiLayout	Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation	arXiv	2024
InstanceDiffusion	Instance-level Control for Image Generation	CVPR	2024
MIGC	Multi-Instance Generation Controller for Text-to-Image Synthesis	CVPR	2024
Zero-Painter	Training-Free Layout Control for Text-to-Image Synthesis	CVPR	2024
Grounded T2I	Grounded Text-to-Image Synthesis with Attention Refocusing	CVPR	2024
Adversarial Layout-to-Image	Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive	ICLR	2024
Cross-Modal Contextualized	Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing	ICLR	2024
Uni-ControlNet	All-in-One Control to Text-to-Image Diffusion Models	arXiv	2023
SpaText	Spatio-Textual Representation for Controllable Image Generation	CVPR	2023
LayoutDiffusion	Controllable Diffusion Model for Layout-to-image Generation	CVPR	2023
SceneComposer	Any-Level Semantic Image Synthesis	CVPR	2023
Dense Text-to-Image	Dense Text-to-Image Generation with Attention Modulation	ICCV	2023
LayoutLLM-T2I	Eliciting Layout Guidance from LLM for Text-to-Image Generation	arXiv	2023
HumanSD	A Native Skeleton-Guided Diffusion Model for Human Image Generation	ICCV	2023
Late-Constraint Diffusion	Late-Constraint Diffusion Guidance for Controllable Image Synthesis	arXiv	2023
Divide & Bind	Divide & Bind Your Attention for Improved Generative Semantic Nursing	BMVC	2023
Attribute-Centric	Attribute-Centric Compositional Text-to-Image Generation	arXiv	2023
Freestyle L2I	Freestyle Layout-to-Image Synthesis	CVPR	2023
Modeling Image Composition	Modeling Image Composition for Complex Scene Generation	CVPR	2022
Interactive Panoptic	Interactive Image Synthesis with Panoptic Layout Generation	CVPR	2022