Text-to-Image Generation
The progressive evolution of text-conditioned image synthesis — from generative adversarial networks through denoising diffusion to autoregressive next-token prediction.
Foundational Models & Face Synthesis
Core T2I architectures across three paradigmatic eras — Diffusion/Transformer (2024–25), Latent Diffusion (2023), and GAN/Early Diffusion (2020–22) — plus the specialized subfield of text-to-face generation.
Controllable & Compositional Generation
Methods enabling fine-grained spatial, structural, or attribute-level control — layout-guided, pose-guided, grounded generation, and compositional attention mechanisms.
Editing, Personalization & Prompts
Text-guided image editing and manipulation, subject-driven personalized generation, and prompt engineering optimization techniques.
Safety, Evaluation & Applications
Evaluation frameworks, safety and bias analysis, robustness research, and downstream applications including segmentation, restoration, and text rendering.
Cross-Modal: Video, 3D & Motion
Natural extensions of T2I into the temporal and spatial domains — text-to-video, text-to-3D, motion generation, and shape synthesis.
Arena Leaderboards & Benchmarks
Live arena rankings from human preference votes (LM Arena, Artificial Analysis), established benchmarks, quantitative metrics, training datasets, and surveys.