Stable Diffusion XL (SDXL) Benchmark | 769 images / $ | Salad Stable Diffusion XL (SDXL) Benchmark A couple of months back, we showed you how to get almost . Now, with the release of , we’re fielding a lot of questions regarding the potential of consumer GPUs for serving SDXL inference at scale. The answer from our Stable Diffusion XL (SDXL) Benchmark is a resounding yes. 5000 images per dollar with Stable Diffusion 1.5 Stable Diffusion XL In this benchmark, we generated with randomized prompts on equipped with We saw an average image generation time of at a per-image cost of . 60.6k hi-res images 39 nodes RTX 3090 and RTX 4090 GPUs. 15.60s $0.0013 At , consumer GPUs on are still the best bang for your buck for , even when enabling no optimizations on Salad and all optimizations on AWS. 769 SDXL images per dollar Salad’s distributed cloud AI image generation Architecture We used an inference container based on , along with a custom worker written in Typescript that implemented the job processing pipeline. The worker used HTTP to communicate with both the SDNext container and with our batch framework. SDNext Our simple batch-processing framework comprises: Image files stored in AWS S3. Storage: Jobs are queued via AWS SQS, with unique identifiers and pre-signed URLs to upload the generated images. Queue System: After images are generated and uploaded, download URLs for each job are stored in DynamoDB. Result Storage: We integrated HTTP handlers using AWS Lambda for easy access by workers to the queue and table. Worker Coordination: Discover our open-source code for a deeper dive: Job Queue Service Recording Service – SDXL Benchmark Worker Docker Image – SDNext Preloaded with SDXL Docker Image Deployment on Salad We set up a container group targeting nodes with four vCPUs, 32GB of RAM, and GPUs with 24GB of VRAM, which includes the RTX 3090, 3090 ti, and 4090. We filled a queue with randomized prompts in the following format: `a ${adjective} ${salad} salad on a ${servingDish} in the style of ${artist}` We used ChatGPT to generate roughly 100 options for each variable in the prompt and queued up jobs with four images per prompt. SDXL is composed of two models: a base and a refiner. We generated each image at 1216 x 896 resolution, using the base model for 20 steps and the refiner model for 15 steps. You can see the exact settings we sent to the here. SDNext API Results – 60,600 Images for $79 Over the benchmark period, we generated more than and uploaded more than to our S3 bucket, incurring only in charges from Salad, which is far less expensive than using an A10g on AWS, and orders of magnitude cheaper than fully managed services like the Stability API. We did see slower image generation times on consumer GPUs than on datacenter GPUs, but the cost differences give Salad the edge. While an optimized model on an A100 did provide the best image generation time, it was by far the most expensive per image of all methods evaluated. 60k images 90GB of content $79 Grab a and see all the salads we made on our GitHub page. fork here Future Improvements For comparison with AWS, we gave them that we did not implement in the container we ran on Salad. In particular, torch.compile isn’t practical on Salad, because it adds 40+ minutes to the container’s start time, and Salad’s nodes are ephemeral. However, such a long start time might be an acceptable tradeoff in a data center context with dedicated nodes that can be expected to stay up for a very long time, so we did use torch.compile on AWS. several advantages Additionally, we used the default fp32 variational autoencoder (vae) in our salad worker and an in our AWS worker, giving another performance edge to the legacy cloud provider. Unlike re-compiling the model at start time, including an alternate vae is something that would be practical to do on Salad, and is an optimization we would pursue in future projects. fp16 vae Salad Cloud – Still The Best Value for AI/ML Inference at Scale SaladCloud remains the most cost-effective platform for AI/ML inference at scale. The recent benchmarking of Stable Diffusion XL further highlights the competitive edge this distributed cloud platform offers, even as models get larger and more demanding. Also published . here

Stable Diffusion XL: Achieving High-Volume AI Image Generation on Consumer GPUs with Salad Cloud

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Reducing AI Transcription Costs and Time With Salad

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Reducing AI Transcription Costs and Time With Salad

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps