Text-to-Image Generation

The progressive evolution of text-conditioned image synthesis — from generative adversarial networks through denoising diffusion to autoregressive next-token prediction.

Text-to-image (T2I) generation has undergone three paradigmatic shifts: GAN-based approaches (2016–2021), diffusion-based models (2021–present), and the emerging autoregressive transformer paradigm. Each generation has brought dramatic improvements in fidelity, controllability, and compositional reasoning. Explore the six sub-domains below.