This story draft by @synthesizing has not been reviewed by an editor, YET.
Authors:
(1) Dustin Podell, Stability AI, Applied Research;
(2) Zion English, Stability AI, Applied Research;
(3) Kyle Lacey, Stability AI, Applied Research;
(4) Andreas Blattmann, Stability AI, Applied Research;
(5) Tim Dockhorn, Stability AI, Applied Research;
(6) Jonas Müller, Stability AI, Applied Research;
(7) Joe Penna, Stability AI, Applied Research;
(8) Robin Rombach, Stability AI, Applied Research.
2.4 Improved Autoencoder and 2.5 Putting Everything Together
Appendix
D Comparison to the State of the Art
E Comparison to Midjourney v5.1
F On FID Assessment of Generative Text-Image Foundation Models
G Additional Comparison between Single- and Two-Stage SDXL pipeline
H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL
I Multi-Aspect Training Hyperparameters
J Pseudo-code for Conditioning Concatenation along the Channel Axis
We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3.
This paper is available on arxiv under CC BY 4.0 DEED license.