📑Image captions

Improve your results by providing captions

When training Stable Diffusion models, captions can help the model better understand the concepts and styles in your training images.

Using captions

Take the following steps to provide captions.

  1. Open a text editor and create a new file captions.json (you can name it whatever, but it has to be a JSON file)

  2. For each image file in your training dataset, write a caption in the JSON file. Each individual caption can be up to 77 tokens long (around 300 characters). Use the filename including file ending but without the folder or path:

  "myimage0001.jpeg": "ukj soda bottle on a wooden table in warm afternoon lighting, pine trees in the background",
  "myimage0002.jpeg": "two ukj soda bottles next to a tropical waterfall",

You can also provide multiple captions per image by providing a list. In that case we will randomly sample them during training:

  "myimage0002.jpeg": [
    "two ukj soda bottles next to a tropical waterfall",
    "two ukj soda bottles by a waterfall in the jungle"
  1. Go to https://dreamlook.ai/dreambooth and enable "Expert mode" at the top of the page.

  2. Upload the JSON file by clicking on "Image captions" under "Advanced settings".

  3. Configure the other parameters for your job and start it. That's it!

You do not have to provide captions for all images. When not providing captions for some of the training images we will fall back to instance_prompt.

When using the API, the filename is the last part of the image URL, excluding any URL parameter.

For example:

  • URL: https://myserver/images/image001.jpeg?token=x91dj1kjh41bxlj1

  • Filename to use in JSON file: image001.jpeg

Using individual .txt files

Some captioning tools use individual .txt files for captions:

We provide a Colab notebook to convert captions in this format into our JSON format:

🔗 https://colab.research.google.com/drive/13s9cMduESF4Wzv8tVcajQPLjrdoH5hH3

Simply follow the instructions in the notebook to create the JSON file, then upload it to dreamlook.ai as described above when configuring your job.

Failure cases

The job may fail under the following conditions:

  • The file could not be parsed as JSON.

  • The file contains captions in an invalid format - see above.

  • Not a single provided caption could be matched to a filename.

If this happens, the tokens used for this job are immediately returned.

How to best write captions?

This is an active field of research and this likely evolve over time.

Since writing captions manually can be quite tedious, a common practice is to use AI models such as GPT4V or BLIP2 to write captions automatically.

Don't hesitate to ask on our Discord server if you are looking for more guidance!

Last updated