Authors: (1) Dustin Podell, Stability AI, Applied Research; (2) Zion English, Stability AI, Applied Research; (3) Kyle Lacey, Stability AI, Applied Research; (4) Andreas Blattmann, Stability AI, Applied Research; (5) Tim Dockhorn, Stability AI, Applied Research; (6) Jonas Müller, Stability AI, Applied Research; (7) Joe Penna, Stability AI, Applied Research; (8) Robin Rombach, Stability AI, Applied Research. Table of Links Abstract and 1 Introduction 2 Improving Stable Diffusion 2.1 Architecture & Scale 2.2 Micro-Conditioning 2.3 Multi-Aspect Training 2.4 Improved Autoencoder and 2.5 Putting Everything Together 3 Future Work Appendix A Acknowledgements B Limitations C Diffusion Models D Comparison to the State of the Art E Comparison to Midjourney v5.1 F On FID Assessment of Generative Text-Image Foundation Models G Additional Comparison between Single- and Two-Stage SDXL pipeline References G Additional Comparison between Single- and Two-Stage SDXL pipeline H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL I Multi-Aspect Training Hyperparameters We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3. J Pseudo-code for Conditioning Concatenation along the Channel Axis This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Dustin Podell, Stability AI, Applied Research; (2) Zion English, Stability AI, Applied Research; (3) Kyle Lacey, Stability AI, Applied Research; (4) Andreas Blattmann, Stability AI, Applied Research; (5) Tim Dockhorn, Stability AI, Applied Research; (6) Jonas Müller, Stability AI, Applied Research; (7) Joe Penna, Stability AI, Applied Research; (8) Robin Rombach, Stability AI, Applied Research. Authors: Authors: (1) Dustin Podell, Stability AI, Applied Research; (2) Zion English, Stability AI, Applied Research; (3) Kyle Lacey, Stability AI, Applied Research; (4) Andreas Blattmann, Stability AI, Applied Research; (5) Tim Dockhorn, Stability AI, Applied Research; (6) Jonas Müller, Stability AI, Applied Research; (7) Joe Penna, Stability AI, Applied Research; (8) Robin Rombach, Stability AI, Applied Research. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Improving Stable Diffusion 2 Improving Stable Diffusion 2.1 Architecture & Scale 2.1 Architecture & Scale 2.2 Micro-Conditioning 2.2 Micro-Conditioning 2.3 Multi-Aspect Training 2.3 Multi-Aspect Training 2.4 Improved Autoencoder and 2.5 Putting Everything Together 2.4 Improved Autoencoder and 2.5 Putting Everything Together 3 Future Work 3 Future Work Appendix Appendix A Acknowledgements A Acknowledgements B Limitations B Limitations C Diffusion Models C Diffusion Models D Comparison to the State of the Art D Comparison to the State of the Art E Comparison to Midjourney v5.1 E Comparison to Midjourney v5.1 F On FID Assessment of Generative Text-Image Foundation Models F On FID Assessment of Generative Text-Image Foundation Models G Additional Comparison between Single- and Two-Stage SDXL pipeline G Additional Comparison between Single- and Two-Stage SDXL pipeline References References G Additional Comparison between Single- and Two-Stage SDXL pipeline H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL I Multi-Aspect Training Hyperparameters We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3. J Pseudo-code for Conditioning Concatenation along the Channel Axis This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv