Controllable Deep Image Synthesis
- Deep learning breakthroughs have significantly advanced computer vision in the past decade, particularly in image synthesis, which involves generating and manipulating images. Image synthesis has numerous practical applications, including art generation, editing, virtual reality, video games, and computer-aided design. While learning the unconditional distribution of natural images is interesting, gaining control over the image generation process by learning a conditional distribution is essential for practical applications. This thesis presents new methods of controllable generative models for high-quality deep image synthesis, building upon and extending the progress made in generative deep learning over the past decade. The central research question is how to advance controllable deep image synthesis, which is explored through five main dimensions: understanding the current state-of-the-art, integrating existing approaches, enabling fine-grained user inputs, improving models by focusing on important image areas, and developing an efficient generation algorithm. Natural language, the primary medium through which we communicate thoughts, ideas, and feelings, is arguably the most flexible and intuitive interface for controllable image synthesis. Thus, the first part of this research reviews text-to-image synthesis models and highlights open challenges such as generating complex scenes. The second part develops hybrid models that enhance image quality and alignment by integrating text-to-image synthesis with visual question answering and proposes a framework of robust generative networks. The third part focuses on precise control over image regions, covering attribute-controlled and dense text-to-image synthesis from free-form region descriptions. The fourth part introduces methods that prioritize key image areas, such as dynamic attention-guided diffusion and a curriculum learning approach that progressively blurs object regions to stabilize training and improve quality. Finally, the last part proposes an efficient algorithm leveraging pre-trained models for high-resolution text-based image generation. In summary, this thesis contributes to the field of controllable deep image synthesis, providing new methods and insights for developing advanced generative models.
| Author: | Stanislav Frolov |
|---|---|
| URN: | urn:nbn:de:hbz:386-kluedo-92927 |
| DOI: | https://doi.org/10.26204/KLUEDO/9292 |
| Advisor: | Andreas Dengel |
| Document Type: | Doctoral Thesis |
| Cumulative document: | No |
| Language of publication: | English |
| Date of Publication (online): | 2025/11/04 |
| Year of first Publication: | 2025 |
| Publishing Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
| Granting Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
| Acceptance Date of the Thesis: | 2025/10/06 |
| Date of the Publication (Server): | 2025/11/07 |
| Tag: | artificial intelligence; computer vision; generative models; image synthesis; machine learning |
| Page Number: | XII, 184 |
| Faculties / Organisational entities: | Kaiserslautern - Fachbereich Informatik |
| CCS-Classification (computer science): | I. Computing Methodologies |
| DDC-Cassification: | 0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik |
| Licence (German): |
