📑Image captions
Improve your results by providing captions
When training Stable Diffusion models, captions can help the model better understand the concepts and styles in your training images.
Using captions
Take the following steps to provide captions.
Open a text editor and create a new file
captions.json
(you can name it whatever, but it has to be a JSON file)For each image file in your training dataset, write a caption in the JSON file. Each individual caption can be up to 77 tokens long (around 300 characters). Use the filename including file ending but without the folder or path:
You can also provide multiple captions per image by providing a list. In that case we will randomly sample them during training:
Go to https://dreamlook.ai/dreambooth and enable "Expert mode" at the top of the page.
Upload the JSON file by clicking on "Image captions" under "Advanced settings".
Configure the other parameters for your job and start it. That's it!
You do not have to provide captions for all images. When not providing captions for some of the training images we will fall back to instance_prompt
.
When using the API, the filename is the last part of the image URL, excluding any URL parameter.
For example:
URL:
https://myserver/images/image001.jpeg?token=x91dj1kjh41bxlj1
Filename to use in JSON file:
image001.jpeg
Using individual .txt
files
.txt
filesSome captioning tools use individual .txt
files for captions:
We provide a Colab notebook to convert captions in this format into our JSON format:
🔗 https://colab.research.google.com/drive/13s9cMduESF4Wzv8tVcajQPLjrdoH5hH3
Simply follow the instructions in the notebook to create the JSON file, then upload it to dreamlook.ai as described above when configuring your job.
Failure cases
The job may fail under the following conditions:
The file could not be parsed as JSON.
The file contains captions in an invalid format - see above.
Not a single provided caption could be matched to a filename.
If this happens, the tokens used for this job are immediately returned.
How to best write captions?
This is an active field of research and this likely evolve over time.
Since writing captions manually can be quite tedious, a common practice is to use AI models such as GPT4V or BLIP2 to write captions automatically.
Don't hesitate to ask on our Discord server if you are looking for more guidance!
Last updated