Additional Comparison between Single- and Two-Stage SDXL pipeline

by SynthesizingOctober 4th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

featured image - Additional Comparison between Single- and Two-Stage SDXL pipeline
Synthesizing HackerNoon profile picture
0-item

Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work


Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

References

G Additional Comparison between Single- and Two-Stage SDXL pipeline

Figure 13: SDXL samples (with zoom-ins) without (left) and with (right) the refinement model discussed. Prompt: (top) “close up headshot, futuristic young woman, wild hair sly smile in front of gigantic UFO, dslr, sharp focus, dynamic composition” (bottom) “Three people having dinner at a table at new years eve, cinematic shot, 8k”. Zoom-in for details.


H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL


Figure 14: Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, we show 3 random samples of the respective model for 50 steps of the DDIM sampler [46] and cfg-scale 8.0 [13]




Figure 15: Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, we show 3 random samples of the respective model for 50 steps of the DDIM sampler [46] and cfg-scale 8.0 [13].


I Multi-Aspect Training Hyperparameters

We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3.



J Pseudo-code for Conditioning Concatenation along the Channel Axis


Figure 16: Python code for concatenating the additional conditionings introduced in Secs. 2.1 to 2.3 along the channel dimension.




This paper is available on arxiv under CC BY 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks