paint-brush
Additional Comparison between Single- and Two-Stage SDXL pipelineby@synthesizing

Additional Comparison between Single- and Two-Stage SDXL pipeline

by SynthesizingOctober 4th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

featured image - Additional Comparison between Single- and Two-Stage SDXL pipeline
Synthesizing HackerNoon profile picture

Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work


Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

References

G Additional Comparison between Single- and Two-Stage SDXL pipeline

Figure 13: SDXL samples (with zoom-ins) without (left) and with (right) the refinement model discussed. Prompt: (top) “close up headshot, futuristic young woman, wild hair sly smile in front of gigantic UFO, dslr, sharp focus, dynamic composition” (bottom) “Three people having dinner at a table at new years eve, cinematic shot, 8k”. Zoom-in for details.


H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL


Figure 14: Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, we show 3 random samples of the respective model for 50 steps of the DDIM sampler [46] and cfg-scale 8.0 [13]




Figure 15: Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, we show 3 random samples of the respective model for 50 steps of the DDIM sampler [46] and cfg-scale 8.0 [13].


I Multi-Aspect Training Hyperparameters

We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3.



J Pseudo-code for Conditioning Concatenation along the Channel Axis


Figure 16: Python code for concatenating the additional conditionings introduced in Secs. 2.1 to 2.3 along the channel dimension.




This paper is available on arxiv under CC BY 4.0 DEED license.