← Text-to-Image

Controllable & Compositional Generation

Fine-grained spatial, structural, and attribute-level control over the text-to-image generation process.

⌘K
Methods enabling fine-grained spatial, structural, or attribute-level control over the text-to-image generation process, including layout-guided, pose-guided, and grounded generation approaches.

Controllable & Compositional Generation 30+ papers

ModelFull TitleVenueYear
FLUX.1 ToolsFLUX.1 Fill, Redux, Canny, Depth: Controllable Generation Toolkit BFL2025
ControlNeXtControlNeXt: Powerful and Efficient Control for Image and Video Generation arXiv2025
HiCoHiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation CVPR2025
ACEACE: All-round Creator and Editor Following Instructions via Diffusion Transformer arXiv2025
ControlARControlAR: Controllable Image Generation with Autoregressive Models arXiv2025
OmniControlOmniControl: Control Any Joint at Any Time for Human Motion Generation ICLR2025
ControlNet++ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback ECCV2024
OmniGenOmniGen: Unified Image Generation arXiv2024
Ctrl-XControlling Structure and Appearance for T2I Generation Without Guidance arXiv2024
CreatiLayoutSiamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation arXiv2024
InstanceDiffusionInstance-level Control for Image Generation CVPR2024
MIGCMulti-Instance Generation Controller for Text-to-Image Synthesis CVPR2024
Zero-PainterTraining-Free Layout Control for Text-to-Image Synthesis CVPR2024
Grounded T2IGrounded Text-to-Image Synthesis with Attention Refocusing CVPR2024
Adversarial Layout-to-ImageAdversarial Supervision Makes Layout-to-Image Diffusion Models Thrive ICLR2024
Cross-Modal ContextualizedCross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing ICLR2024
Uni-ControlNetAll-in-One Control to Text-to-Image Diffusion Models arXiv2023
SpaTextSpatio-Textual Representation for Controllable Image Generation CVPR2023
LayoutDiffusionControllable Diffusion Model for Layout-to-image Generation CVPR2023
SceneComposerAny-Level Semantic Image Synthesis CVPR2023
Dense Text-to-ImageDense Text-to-Image Generation with Attention Modulation ICCV2023
LayoutLLM-T2IEliciting Layout Guidance from LLM for Text-to-Image Generation arXiv2023
HumanSDA Native Skeleton-Guided Diffusion Model for Human Image Generation ICCV2023
Late-Constraint DiffusionLate-Constraint Diffusion Guidance for Controllable Image Synthesis arXiv2023
Divide & BindDivide & Bind Your Attention for Improved Generative Semantic Nursing BMVC2023
Attribute-CentricAttribute-Centric Compositional Text-to-Image Generation arXiv2023
Freestyle L2IFreestyle Layout-to-Image Synthesis CVPR2023
Modeling Image CompositionModeling Image Composition for Complex Scene Generation CVPR2022
Interactive PanopticInteractive Image Synthesis with Panoptic Layout Generation CVPR2022