Authors:
(1) Dustin Podell, Stability AI, Applied Research;
(2) Zion English, Stability AI, Applied Research;
(3) Kyle Lacey, Stability AI, Applied Research;
(4) Andreas Blattmann, Stability AI, Applied Research;
(5) Tim Dockhorn, Stability AI, Applied Research;
(6) Jonas Müller, Stability AI, Applied Research;
(7) Joe Penna, Stability AI, Applied Research;
(8) Robin Rombach, Stability AI, Applied Research. Table of Links Abstract and 1 Introduction 2 Improving Stable Diffusion 2.1 Architecture & Scale 2.2 Micro-Conditioning 2.3 Multi-Aspect Training 2.4 Improved Autoencoder and 2.5 Putting Everything Together 3 Future Work Appendix A Acknowledgements B Limitations C Diffusion Models D Comparison to the State of the Art E Comparison to Midjourney v5.1 F On FID Assessment of Generative Text-Image Foundation Models G Additional Comparison between Single- and Two-Stage SDXL pipeline References D Comparison to the State of the Art E Comparison to Midjourney v5.1 E.1 Overall Votes To asses the generation quality of SDXL we perform a user study against the state of the art text-toimage generation platform Midjourney[1]. As the source for image captions we use the PartiPrompts (P2) benchmark [53], that was introduced to compare large text-to-image model on various challenging prompts. For our study, we choose five random prompts from each category, and generate four 1024 × 1024 images by both Midjourney (v5.1, with a set seed of 2) and SDXL for each prompt. These images were then presented to the AWS GroundTruth taskforce, who voted based on adherence to the prompt. The results of these votes are illustrated in Fig. 9. Overall, there is a slight preferance for SDXL over Midjourney in terms of prompt adherence. E.2 Category & challenge comparisons on PartiPrompts (P2) Each prompt from the P2 benchmark is organized into a category and a challenge, each focus on different difficult aspects of the generation process. We show the comparisons for each category (Fig. 10) and challenge (Fig. 11) of P2 below. In four out of six categories SDXL outperforms Midjourney, and in seven out of ten challenges there is no significant difference between both models or SDXL outperforms Midjourney. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Dustin Podell, Stability AI, Applied Research; (2) Zion English, Stability AI, Applied Research; (3) Kyle Lacey, Stability AI, Applied Research; (4) Andreas Blattmann, Stability AI, Applied Research; (5) Tim Dockhorn, Stability AI, Applied Research; (6) Jonas Müller, Stability AI, Applied Research; (7) Joe Penna, Stability AI, Applied Research; (8) Robin Rombach, Stability AI, Applied Research. Authors: Authors: (1) Dustin Podell, Stability AI, Applied Research; (2) Zion English, Stability AI, Applied Research; (3) Kyle Lacey, Stability AI, Applied Research; (4) Andreas Blattmann, Stability AI, Applied Research; (5) Tim Dockhorn, Stability AI, Applied Research; (6) Jonas Müller, Stability AI, Applied Research; (7) Joe Penna, Stability AI, Applied Research; (8) Robin Rombach, Stability AI, Applied Research. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Improving Stable Diffusion 2 Improving Stable Diffusion 2.1 Architecture & Scale 2.1 Architecture & Scale 2.2 Micro-Conditioning 2.2 Micro-Conditioning 2.3 Multi-Aspect Training 2.3 Multi-Aspect Training 2.4 Improved Autoencoder and 2.5 Putting Everything Together 2.4 Improved Autoencoder and 2.5 Putting Everything Together 3 Future Work 3 Future Work Appendix Appendix A Acknowledgements A Acknowledgements B Limitations B Limitations C Diffusion Models C Diffusion Models D Comparison to the State of the Art D Comparison to the State of the Art E Comparison to Midjourney v5.1 E Comparison to Midjourney v5.1 F On FID Assessment of Generative Text-Image Foundation Models F On FID Assessment of Generative Text-Image Foundation Models G Additional Comparison between Single- and Two-Stage SDXL pipeline G Additional Comparison between Single- and Two-Stage SDXL pipeline References References D Comparison to the State of the Art E Comparison to Midjourney v5.1 E.1 Overall Votes To asses the generation quality of SDXL we perform a user study against the state of the art text-toimage generation platform Midjourney[1]. As the source for image captions we use the PartiPrompts (P2) benchmark [53], that was introduced to compare large text-to-image model on various challenging prompts. For our study, we choose five random prompts from each category, and generate four 1024 × 1024 images by both Midjourney (v5.1, with a set seed of 2) and SDXL for each prompt. These images were then presented to the AWS GroundTruth taskforce, who voted based on adherence to the prompt. The results of these votes are illustrated in Fig. 9. Overall, there is a slight preferance for SDXL over Midjourney in terms of prompt adherence. E.2 Category & challenge comparisons on PartiPrompts (P2) Each prompt from the P2 benchmark is organized into a category and a challenge, each focus on different difficult aspects of the generation process. We show the comparisons for each category (Fig. 10) and challenge (Fig. 11) of P2 below. In four out of six categories SDXL outperforms Midjourney, and in seven out of ten challenges there is no significant difference between both models or SDXL outperforms Midjourney. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Comparing SDXL and Midjourney v5.1 on PartiPrompts: Which AI Model Wins?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Additional Comparison between Single- and Two-Stage SDXL pipeline

How to Develop Data-Driven AI Apps: A Guide to Making AI Services Directly From the Database

Building Multimodal Generative AI Systems: Architecture, Refinement, and Enhancement

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Modular Enhancements for Stable Diffusion Architecture

Improving Stable Diffusion Architecture

Additional Comparison between Single- and Two-Stage SDXL pipeline

How to Develop Data-Driven AI Apps: A Guide to Making AI Services Directly From the Database

Building Multimodal Generative AI Systems: Architecture, Refinement, and Enhancement

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Modular Enhancements for Stable Diffusion Architecture

Improving Stable Diffusion Architecture

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps