🐘Stable Diffusion XL

Here is how you get the most out of SDXL finetuning!

Training SDXL models

If you are new to SDXL training, here are recommended settings to get your started!

🎭 Training on portraits

  • 5-15 high resolution images: SDXL does not require many images to get good results. Make sure your images are at least 1024x1024, or they will be scaled up which can introduce artifacts.

  • Train for 1500 steps. You don't need to adjust this number based on the number of images, 1500 steps will give you a good starting point with anything between 5 and 15 images.

  • Use the base Stable Diffusion XL model (stable-diffusion-xl-v1-0). Other models are harder to train and should only be used when you have more experience with SDXL training.

  • Set the learning rate at 1e-5 (learning_rate). Leave the learning rate as default.

  • Set the Text Encoder 1 (learning_rate_te1) learning rate to 3e-6, and don't train Text Encoder 2 (learning_rate_te1). This is empirically what gives the best results.

  • Disable Offset Noise. Don't use Offset Noise except if you specifically want to generate very bright or very dark images.

Check out the Good SD1.5 default settings guide for general advice about image selection.

🎨 Training on style

  • Use 50 to 500 high-quality images. Again, make sure they're at least 1024x1024 to avoid image resizing.

  • Train for 100 steps per image. The optimal number will vary depending on your specific style, so this is the first thing you want to experiment with.

  • Use the base Stable Diffusion XL model (stable-diffusion-xl-v1-0).

  • Set learning rate at 5e-6 (learning_rate). This lower learning rate will help to model train on such a large dataset.

  • Set the Text Encoder 1 (learning_rate_te1) learning rate to 3e-6, and don't train Text Encoder 2 (set learning_rate_te1 to 0.0).

  • Disable Offset Noise.

  • You can use image captions if you have them available, but this is not strictly necessary. If the captions are good, it will help the training process. See how to format your captions.

Generating images with SDXL

Unlike SD1.5, SDXL works very well with short prompts. Using your familiar Stable Diffusion 1.5 prompts with SDXL will generally result in poor images.

If you have trained a model with instance prompt "photo of ukj person", prompts such as "portrait photo of ukj person as santa claus", or "professional photo of ukj person as cowboy" will give decent results out of the box.

We recommend to use the dpmpp-2m-karras scheduler for best results. We generate 1024x1024 images with SDXL at around 3.5s/img (30 inference steps).

How are you training the model?

By default we are training the UNet and the first Text Encoder at a resolution of 1024x1024. You can optionally train both text encoders by ensuring that both learning_rate_te1 and learning_rate_te2 are set to non-null values. In order to increase training efficiency, we are using SNR-γ at 5.0.

Migrating from beta SDXL training

If you were using beta SDXL training (before February 16th 2024), check out how this page to adapt your code and settings to the new version: SDXL update guide

Is LoRA supported for SDXL?

Yes! Simply enable "Extract LoRA" when training a model. The default rank for our SDXL LoRA is 64. The resulting file is roughly 440MB in size (14x smaller than the full checkpoint).

How do I improve on my results?

We have a very enthusiastic and diverse community of model builders on our Discord. Come say hi, we will try to help you as much as possible!

Last updated