When training Stable Diffusion models, captions can help the model better understand the concepts and styles in your training images.
Using captions
Take the following steps to provide captions.
Open a text editor and create a new file captions.json (you can name it whatever, but it has to be a JSON file)
For each image file in your training dataset, write a caption in the JSON file. Each individual caption can be up to 77 tokens long (around 300 characters). Use the filename including file ending but without the folder or path:
captions.json
{"myimage0001.jpeg":"ukj soda bottle on a wooden table in warm afternoon lighting, pine trees in the background","myimage0002.jpeg":"two ukj soda bottles next to a tropical waterfall",...}
You can also provide multiple captions per image by providing a list. In that case we will randomly sample them during training:
{..."myimage0002.jpeg":["two ukj soda bottles next to a tropical waterfall","two ukj soda bottles by a waterfall in the jungle"],...}