Stable Diffusion XL
Here is how you get the most out of SDXL finetuning!
Training SDXL models
If you are new to SDXL training, here are recommended settings to get your started!
🎭 Training on portraits
5-15 high resolution images: SDXL does not require many images to get good results. Make sure your images are at least 1024x1024, or they will be scaled up which can introduce artifacts.
Train for 1500 steps. You don't need to adjust this number based on the number of images, 1500 steps will give you a good starting point with anything between 5 and 15 images.
Use the base Stable Diffusion XL model (
stable-diffusion-xl-v1-0
). Other models are harder to train and should only be used when you have more experience with SDXL training.Set the learning rate at 1e-5 (
learning_rate
). Leave the learning rate as default.Set the Text Encoder 1 (
learning_rate_te1
) learning rate to 3e-6. Currently we don't support training of Text Encoder 2 (hence its learning rate is set to zero).Disable Offset Noise. Don't use Offset Noise except if you specifically want to generate very bright or very dark images.
Check out the Good SD1.5 default settings guide for general advice about image selection.
🎨 Training on style
Use 50 to 500 high-quality images. Again, make sure they're at least 1024x1024 to avoid image resizing.
Train for 100 steps per image. The optimal number will vary depending on your specific style, so this is the first thing you want to experiment with.
Use the base Stable Diffusion XL model (
stable-diffusion-xl-v1-0
).Set learning rate at 5e-6 (
learning_rate
). This lower learning rate will help to model train on such a large dataset.Set the Text Encoder 1 (
learning_rate_te1
) learning rate to 3e-6. Currently we don't support training of Text Encoder 2 (hence its learning rate is set to zero).Disable Offset Noise.
You can use image captions if you have them available, but this is not strictly necessary. If the captions are good, it will help the training process. See how to format your captions.
Generating images with SDXL
Unlike SD1.5, SDXL works very well with short prompts. Using your familiar Stable Diffusion 1.5 prompts with SDXL will generally result in poor images.
If you have trained a model with instance prompt "photo of ukj person"
, prompts such as "portrait photo of ukj person as santa claus"
, or "professional photo of ukj person as cowboy"
will give decent results out of the box.
We recommend to use the dpmpp-2m-karras
scheduler for best results. We generate 1024x1024 images with SDXL at around 3.5s/img (30 inference steps).
How are you training the model?
By default we are training the UNet and Text Encoder 1 at a resolution of 1024x1024. Training of text encoder 2 is not supported. In order to disable training of a text encoder, set its learning rate to zero. In order to increase training efficiency, we are using SNR-γ at 5.0.
Is LoRA supported for SDXL?
Yes! A LoRA file is extracted at the end of your run by default. The rank for our SDXL LoRA is 64. The resulting file is roughly 440MB in size (14x smaller than the full checkpoint).
How do I improve on my results?
We have a very enthusiastic and diverse community of model builders on our Discord. Come say hi, we will try to help you as much as possible!
Last updated