In this post, we’ll look at getting setup with running your own A.I. image generator. You need a Linux system with a CUDA enabled card to get this working through to the end. I did get the basic txt-2img script working in Windows. Unfortunately, for the July released retrieval-augmented diffusion models, you need scann
in order to index the openimages dataset, which is only available on Linux. WSL worked fine for me.
|
|
|
|
---|
Here’s roughly what I gather you’ll need:
= 32GB RAM
I randomly stumbled upon the latent-diffusion repo while I was browsing AI subreddits. A few posts mentioned it and I didn’t know what it was so I went googling.
I am not a computer scientist, mathematician, or any of the things that would be required to really understand what’s going on here. The paper about the new RAD models are understandable and interesting, but I haven’t done much research yet beyond that.
There are 2 image generation techniques possible with Latent Diffusion. We will install and take a look at both.
I believe the txt2-img model that we’ll setup first is what we are used to with other image generation tools online – it makes a super low res image clip thinks is a good prompt match and denoises and upscales it.
The RAD model uses a configurable database of images as a reference AND does the diffusion like we are used to. This seems to mean it’s less good as a general purpose generator, but could be used with a specific training set you could feasibly create. I will try this in the future.
RAD also generates 768X768 images which is pretty impressive.
|
|
---|
Make sure you have CUDA 11.3 installed and conda
works in your shell. You can get Conda here
(Re)install Pytorch now, lest ye end up in dependency hell later. I did this via pip
but conda may work too.
cd
into the repo and run:
conda env create -f environment.yaml
conda activate ldm
This will setup the initial environment. If you get errors with Pytorch later and need to reinstall it, come back to environment.yaml and reinstall using the correct package versions.
Download the model and you’re all ready to go so long as your torch dependencies are all setup correct.
mkdir -p models/ldm/text2img-large/
wget -O models/ldm/text2img-large/model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt
You should be able to sample images now using:
python scripts/txt2img.py --prompt "my cool image"
Make sure to see the options you’re able to set in the README.
You can use this bash script to make running everything from a shell easier. You can include this in your ~/.bashrc
file and reopen your shell. Make sure to change the 3 paths to suit your needs. We’ll look at PLMS later.
function txt2img() {
if [ $# -eq 0 ]; then
echo "Usage: diffuse '[prompt]' [--plms]"
exit 1
fi
cd /mnt/c/diffusion/latent-diffusion;
conda activate ldm;
if [ $2 = '--plms' ]; then
python scripts/txt2img.py --prompt $1 --outdir '/mnt/g/My Drive/AI Image Tests/text2img' --plms --n_iter 4 --ddim_eta 0.0
else
python scripts/txt2img.py --prompt $1 --outdir '/mnt/g/My Drive/AI Image Tests/text2img' --n_iter 4 --ddim_eta 0.0
fi
}
The images are 512X512 by default. You can use This Jupiter Notebook to upscale the images with pretty good results.
htop
to troubleshoot/view memory usage.pip
and/or conda
(I did both and am not sure which made everything work)
|
|
---|
You can read more about RDM here. As said before, it essentially combines a database of images with Clip descriptions with the existing diffusion process, from what I understand. This part takes a lot longer to get running.
Note, this model, especially when using the openimages training is best at recreating real things and doesn’t seem very good (yet) and creating the weird images we’re used to from the diffusion models.
Get the Text-to-Image model working first
Install new packages and download training model.
pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0
pip install git+https://github.com/arogozhnikov/einops.git
mkdir -p models/rdm/rdm768x768/
wget -O models/rdm/rdm768x768/model.ckpt https://ommer-lab.com/files/rdm/model.ckpt
Test everything is working so far with
python scripts/knn2img.py --prompt "I'm a computer"
If everything went well, you should see a success text in your shell. Now we need to download all the image indexes/models. The openimages zip is 11GB. The ArtBench data is pretty small.
mkdir -p data/rdm/retrieval_databases
wget -O data/rdm/retrieval_databases/artbench.zip https://ommer-lab.com/files/rdm/artbench_databases.zip
wget -O data/rdm/retrieval_databases/openimages.zip https://ommer-lab.com/files/rdm/openimages_database.zip
unzip data/rdm/retrieval_databases/artbench.zip -d data/rdm/retrieval_databases/
unzip data/rdm/retrieval_databases/openimages.zip -d data/rdm/retrieval_databases/
mkdir -p data/rdm/searchers
wget -O data/rdm/searchers/artbench.zip https://ommer-lab.com/files/rdm/artbench_searchers.zip
unzip data/rdm/searchers/artbench.zip -d data/rdm/searchers
We’re ready to use the Artbench models now (which work pretty well in my limited testing), but what we really want is to use the massive openimages model as our reference. We downloaded the data, but we need to create the index.
If you want to test the Artbench database, run
python scripts/knn2img.py --prompt "A blue pig" --use_neighbors --knn 20
|
|
---|
Unless you have a super computer, the terrible memory management in python multiprocessing (from what I can tell) will stop us from using the 4 files we unzipped with the script from the repo to index them. Everything else I tried, the processes would run out of memory.
Luckily, we just need to concatenate the files. You can delete the old part files after if you would like. We also need to move the part files out of the openimages folder.
cd data/rdm/retrieval_databases/openimages/
cat 2000000x768_part1.npz 2000000x768_part2.npz 2000000x768_part3.npz 2000000x768_part4.npz > 2000000x768.npz
mv 2000000x768_* ~/temp
cd ../../../../
Now when we run the training script, we shouldn’t run out of memory. I think this is some issue in their multi file function in the training script.
python scripts/train_searcher.py
We need to change the batch size in scripts/knn2img.py
so we don’t run out of GPU memory. You may need to lower to 1, but I was able to set to 2 on a 3090.
Open scripts/knn2img.py
in an editor
Go to line 243 or find “n_samples”
Change the default value to 1 or 2 and save
You can also do this using the command line parameter, but since it will never work for me, I wanted to change the default value. Note if you update the repo in the future this will likely get overwritten.
Now we can generate RDM images using the openimages database:
python scripts/knn2img.py --prompt "A blue pig" --database openimages --use_neighbors --knn 20
Here is a ~/.bashrc
script to run it conveniently. Again, make sure to change the 3 paths to suit your needs:
function diffuse() {
if [ $# -eq 0 ]; then
echo "Usage: diffuse '[prompt]' [--plms]"
exit 1
fi
cd /mnt/c/diffusion/latent-diffusion;
conda activate ldm;
if [ $2 = '--plms' ]; then
python scripts/knn2img.py --database openimages --prompt $1 --use_neighbors --outdir '/mnt/g/My Drive/AI Image Tests/RDM' --knn 20 --plms
else
python scripts/knn2img.py --database openimages --prompt $1 --use_neighbors --outdir '/mnt/g/My Drive/AI Image Tests/RDM' --knn 20
fi
}
You can also run prompts in batches by creating a file (in this case prompts.txt
) and pointing the script to it with the –from-file
parameter like this
python scripts/knn2img.py --database openimages --from-file ./prompts.txt --use_neighbors --outdir '/mnt/g/My Drive/AI Image Tests/RDM' --knn 20
sample prompts.txt
prompt 1
prompt 2
prompt 3
This should save a lot of time if you’re running a lot of queries since the model only gets loaded once. The text-to-image model does not have this option.
|
|
|
---|
Below are some inital results from both the models with PLMS on and off. PLMS should make things run faster. For now, we won’t measure the speed increase but will look subjectively at image quality/results.
Since the RDM model will only make 2 photos at a time for my system, I included both photos instead of a grid for that model.
I will almost certainly do more specific testing in the future and these models will be included in future comparison posts.
A wood house on a hill, landscape photography
Text-to-image
without plms |
with plms |
---|---|
|
|
RDM
without plms |
with plms |
---|---|
|
|
|
|
A purple stop sign
Text-to-image
without plms |
with plms |
---|---|
|
|
RDM
without plms |
with plms |
---|---|
|
|
|
|
fox den, digital art
Text-to-image
without plms |
with plms |
---|---|
|
|
RDM
without plms |
with plms |
---|---|
|
|
|
|
winning the big game, award winning photography
Text-to-image
without plms |
with plms |
---|---|
|
|
RDM
without plms |
with plms |
---|---|
|
|
|
|
a sports car driving in sand dunes
Text-to-image
without plms |
with plms |
---|---|
|
|
RDM
without plms |
with plms |
---|---|
|
|
|
|
And there you have it – no more credits or monthly limits, just pure AI image generation all your own. Once I figure out some tuning settings and create my own image training database, we’ll likely talk about this again.
Hopefully with some tuning and new releases/models this will get closer to the WOW factor DALL-E has.
Also published here.