paint-brush

This story draft by @synthesizing has not been reviewed by an editor, YET.

Comparison to Midjourney v5.1

Synthesizing HackerNoon profile picture
0-item

Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

Table of Links

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work


Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL

I Multi-Aspect Training Hyperparameters

J Pseudo-code for Conditioning Concatenation along the Channel Axis

References

E Comparison to Midjourney v5.1

E.1 Overall Votes

To asses the generation quality of SDXL we perform a user study against the state of the art text-toimage generation platform Midjourney[1]. As the source for image captions we use the PartiPrompts (P2) benchmark [53], that was introduced to compare large text-to-image model on various challenging prompts.


For our study, we choose five random prompts from each category, and generate four 1024 × 1024 images by both Midjourney (v5.1, with a set seed of 2) and SDXL for each prompt. These images were then presented to the AWS GroundTruth taskforce, who voted based on adherence to the prompt. The results of these votes are illustrated in Fig. 9. Overall, there is a slight preferance for SDXL over Midjourney in terms of prompt adherence.


Figure 9: Results from 17,153 user preference comparisons between SDXL v0.9 and Midjourney v5.1, which was the latest version available at the time. The comparisons span all “categories” and “challenges” in the PartiPrompts (P2) benchmark. Notably, SDXL was favored 54.9% of the time over Midjourney V5.1. Preliminary testing indicates that the recently-released Midjourney V5.2 has lower prompt comprehension than its predecessor, but the laborious process of generating multiple prompts hampers the speed of conducting broader tests.

E.2 Category & challenge comparisons on PartiPrompts (P2)

Each prompt from the P2 benchmark is organized into a category and a challenge, each focus on different difficult aspects of the generation process. We show the comparisons for each category (Fig. 10) and challenge (Fig. 11) of P2 below. In four out of six categories SDXL outperforms Midjourney, and in seven out of ten challenges there is no significant difference between both models or SDXL outperforms Midjourney.


Figure 10: User preference comparison of SDXL (without refinement model) and Midjourney V5.1 across particular text categories. SDXL outperforms Midjourney V5.1 in all but two categories.


Figure 11: Preference comparisons of SDXL (with refinement model) to Midjourney V5.1 on complex prompts. SDXL either outperforms or is statistically equal to Midjourney V5.1 in 7 out of 10 categories.


This paper is available on arxiv under CC BY 4.0 DEED license.


[1] We compare against v5.1 since that was the best version available at that time.

L O A D I N G
. . . comments & more!

About Author

Synthesizing HackerNoon profile picture
Synthesizing@synthesizing
Synthesizing weaves diverse perspectives into innovative solutions.

Topics

Around The Web...