# Image captions

When training Stable Diffusion models, captions can help the model better understand the concepts and styles in your training images.&#x20;

## Using captions

Take the following steps to provide captions.

1. Open a text editor and create a new file `captions.json` (you can name it whatever, but it has to be a JSON file)
2. For each image file in your training dataset, write a caption in the JSON file. Each individual caption can be up to 77 tokens long (around 300 characters). Use the filename **including file ending** but without the folder or path:

{% code title="captions.json" overflow="wrap" %}

```json
{
  "myimage0001.jpeg": "ukj soda bottle on a wooden table in warm afternoon lighting, pine trees in the background",
  "myimage0002.jpeg": "two ukj soda bottles next to a tropical waterfall",
  ...
}
```

{% endcode %}

You can also provide multiple captions per image by providing a list. In that case we will randomly sample them during training:

```json
{
  ...
  "myimage0002.jpeg": [
    "two ukj soda bottles next to a tropical waterfall",
    "two ukj soda bottles by a waterfall in the jungle"
    ],
  ...
}
```

4. Go to <https://dreamlook.ai/dreambooth> and enable "Expert mode" at the top of the page.&#x20;
5. Upload the JSON file by clicking on "Image captions" under "Advanced settings".
6. Configure the other parameters for your job and start it. That's it!&#x20;

{% hint style="info" %}
You do not have to provide captions for *all* images. When not providing captions for some of the training images we will fall back to `instance_prompt`.&#x20;
{% endhint %}

{% hint style="info" %}
When using the API, the filename is the last part of the image URL, excluding any URL parameter.&#x20;

For example:

* URL: `https://myserver/images/image001.jpeg?token=x91dj1kjh41bxlj1`
* Filename to use in JSON file: `image001.jpeg`
  {% endhint %}

## Using individual `.txt` files

Some captioning tools use individual `.txt` files for captions:

<figure><img src="https://3775798052-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FbjzMS9bi5lfz4h6oo4EA%2Fuploads%2FV5F5kMRL6Op1r9rSOxdP%2FScreenshot%202023-09-01%20at%2016.46.51.png?alt=media&#x26;token=267f0d03-1d1f-490f-8d0d-f198ca84a4a0" alt=""><figcaption></figcaption></figure>

We provide a Colab notebook to convert captions in this format into our JSON format:

**🔗** [**https://colab.research.google.com/drive/13s9cMduESF4Wzv8tVcajQPLjrdoH5hH3**](https://colab.research.google.com/drive/13s9cMduESF4Wzv8tVcajQPLjrdoH5hH3#scrollTo=jmHafIiwa9F6)

Simply follow the instructions in the notebook to create the JSON file, then upload it to [dreamlook.ai](https://dreamlook.ai/) as described above when configuring your job.

## Failure cases

The job may fail under the following conditions:

* The file could not be parsed as JSON.
* The file contains captions in an invalid format - see above.
* Not a single provided caption could be matched to a filename.&#x20;

If this happens, the tokens used for this job are immediately returned.&#x20;

## How to best write captions?

This is an active field of research and this likely evolve over time.

Since writing captions manually can be quite tedious, a common practice is to use AI models such as GPT4V or BLIP2 to write captions automatically.

**Don't hesitate to ask on** [**our Discord server**](https://discord.gg/yX9D9KxHMS) **if you are looking for more guidance!**
