train_text_to_image_lora_sdxl.py throws error on `--resume_from_checkpoint`

### Describe the bug

When trying to resume training from checkpoint using `train_text_to_image_lora_sdxl.py` and `--resume_from_checkpoint=latest`, I'm getting error (logs below).

### Reproduction

Command to run:

```bash
accelerate launch diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py \
  --output_dir="../sdxl-lora-lower-decks-aesthetic" \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0"  \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --hub_model_id="ra100/sdxl-lora-lower-decks-aesthetic" \
  --dataset_name="ra100/lower-decks" \
  --checkpointing_steps=500 \
  --checkpoints_total_limit=10 \
  --gradient_accumulation_steps=4 \
  --learning_rate=4e-5 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=3000 \
  --mixed_precision="fp16" \
  --num_validation_images=2 \
  --report_to="wandb" \
  --resolution=1024 \
  --seed="167813" \
  --train_batch_size=1 \
  --train_text_encoder \
  --validation_epochs=20 \
  --resume_from_checkpoint=latest \
  --logging_dir=./logs \
  --validation_prompt="a blue skin woman, commander, red uniform, stld aesthetic"
```

Hub model: https://huggingface.co/ra100/sdxl-lora-lower-decks-aesthetic
Hub dataset: https://huggingface.co/datasets/ra100/lower-decks

### Logs

```shell
08/14/2023 19:02:54 - INFO - __main__ - ***** Running training *****
08/14/2023 19:02:54 - INFO - __main__ -   Num examples = 35
08/14/2023 19:02:54 - INFO - __main__ -   Num Epochs = 334
08/14/2023 19:02:54 - INFO - __main__ -   Instantaneous batch size per device = 1
08/14/2023 19:02:54 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
08/14/2023 19:02:54 - INFO - __main__ -   Gradient Accumulation steps = 4
08/14/2023 19:02:54 - INFO - __main__ -   Total optimization steps = 3000
08/14/2023 19:02:54 - INFO - accelerate.accelerator - Loading states from ../sdxl-lora-lower-decks-aesthetic/checkpoint-2500
Loading unet.
Loading text_encoder.
Loading text_encoder.
Traceback (most recent call last):
  File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1294, in <module>
    main(args)
  File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 986, in main
    accelerator.load_state(os.path.join(args.output_dir, path))
  File "/home/ra100/miniconda3/envs/dreambooth/lib/python3.11/site-packages/accelerate/accelerator.py", line 2695, in load_state
    hook(models, input_dir)
  File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 736, in load_model_hook
    LoraLoaderMixin.load_lora_into_text_encoder(
  File "/media/quick/ai/dreambooth/diffusers/src/diffusers/loaders.py", line 1309, in load_lora_into_text_encoder
    load_state_dict_results = text_encoder.load_state_dict(text_encoder_lora_state_dict, strict=False)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ra100/miniconda3/envs/dreambooth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModelWithProjection:
	size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
	size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
	size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
Resuming from checkpoint checkpoint-2500
```

wandb logs:

[debug.log](https://github.com/huggingface/diffusers/files/12337695/debug.log)
[debug-internal.log](https://github.com/huggingface/diffusers/files/12337696/debug-internal.log)
[output.log](https://github.com/huggingface/diffusers/files/12337708/output.log)

`conda-environment.yaml`

```yaml
name: dreambooth
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2023.05.30=h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.9=h7f8727e_0
  - pip=23.2.1=py311h06a4308_0
  - python=3.11.4=h955ad1f_0
  - readline=8.2=h5eee18b_0
  - setuptools=68.0.0=py311h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.38.4=py311h06a4308_0
  - xz=5.4.2=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
      - absl-py==1.4.0
      - accelerate==0.21.0
      - aiohttp==3.8.5
      - aiosignal==1.3.1
      - appdirs==1.4.4
      - async-timeout==4.0.2
      - attrs==23.1.0
      - bitsandbytes==0.41.1
      - black==23.7.0
      - cachetools==5.3.1
      - certifi==2023.7.22
      - charset-normalizer==3.2.0
      - click==8.1.6
      - cmake==3.27.0
      - datasets==2.14.3
      - diffusers==0.20.0.dev0
      - dill==0.3.7
      - docker-pycreds==0.4.0
      - filelock==3.12.2
      - frozenlist==1.4.0
      - fsspec==2023.6.0
      - ftfy==6.1.1
      - gitdb==4.0.10
      - gitpython==3.1.32
      - google-auth==2.22.0
      - google-auth-oauthlib==1.0.0
      - grpcio==1.56.2
      - huggingface-hub==0.16.4
      - idna==3.4
      - importlib-metadata==6.8.0
      - jinja2==3.1.2
      - lit==16.0.6
      - markdown==3.4.4
      - markupsafe==2.1.3
      - mpmath==1.3.0
      - multidict==6.0.4
      - multiprocess==0.70.15
      - mypy-extensions==1.0.0
      - networkx==3.1
      - numpy==1.25.2
      - nvidia-cublas-cu11==11.10.3.66
      - nvidia-cuda-cupti-cu11==11.7.101
      - nvidia-cuda-nvrtc-cu11==11.7.99
      - nvidia-cuda-runtime-cu11==11.7.99
      - nvidia-cudnn-cu11==8.5.0.96
      - nvidia-cufft-cu11==10.9.0.58
      - nvidia-curand-cu11==10.2.10.91
      - nvidia-cusolver-cu11==11.4.0.1
      - nvidia-cusparse-cu11==11.7.4.91
      - nvidia-nccl-cu11==2.14.3
      - nvidia-nvtx-cu11==11.7.91
      - oauthlib==3.2.2
      - packaging==23.1
      - pandas==2.0.3
      - pathspec==0.11.2
      - pathtools==0.1.2
      - pillow==10.0.0
      - platformdirs==3.10.0
      - protobuf==4.23.4
      - psutil==5.9.5
      - pyarrow==12.0.1
      - pyasn1==0.5.0
      - pyasn1-modules==0.3.0
      - pyre-extensions==0.0.29
      - python-dateutil==2.8.2
      - pytz==2023.3
      - pyyaml==6.0.1
      - regex==2023.6.3
      - requests==2.31.0
      - requests-oauthlib==1.3.1
      - rsa==4.9
      - ruff==0.0.283
      - safetensors==0.3.1
      - scipy==1.11.1
      - sentry-sdk==1.29.2
      - setproctitle==1.3.2
      - six==1.16.0
      - smmap==5.0.0
      - sympy==1.12
      - tensorboard==2.13.0
      - tensorboard-data-server==0.7.1
      - tokenizers==0.13.3
      - torch==2.0.1+cu118
      - torchaudio==2.0.2+cu118
      - torchvision==0.15.2+cu118
      - tqdm==4.65.0
      - transformers==4.31.0
      - triton==2.0.0
      - typing-extensions==4.7.1
      - typing-inspect==0.9.0
      - tzdata==2023.3
      - urllib3==1.26.16
      - wandb==0.15.8
      - wcwidth==0.2.6
      - werkzeug==2.3.6
      - xformers==0.0.20
      - xxhash==3.3.0
      - yarl==1.9.2
      - zipp==3.16.2
prefix: /home/ra100/miniconda3/envs/dreambooth
```

`config.yaml`

```yaml
wandb_version: 1

_wandb:
  desc: null
  value:
    python_version: 3.11.4
    cli_version: 0.15.8
    framework: huggingface
    huggingface_version: 4.31.0
    is_jupyter_run: false
    is_kaggle_kernel: false
    start_time: 1692032573.004215
    t:
      1:
      - 1
      - 11
      - 41
      - 49
      - 51
      - 55
      - 71
      - 83
      2:
      - 1
      - 11
      - 41
      - 49
      - 51
      - 55
      - 71
      - 83
      3:
      - 23
      4: 3.11.4
      5: 0.15.8
      6: 4.31.0
      8:
      - 5
pretrained_model_name_or_path:
  desc: null
  value: stabilityai/stable-diffusion-xl-base-1.0
pretrained_vae_model_name_or_path:
  desc: null
  value: madebyollin/sdxl-vae-fp16-fix
revision:
  desc: null
  value: null
dataset_name:
  desc: null
  value: ra100/lower-decks
dataset_config_name:
  desc: null
  value: null
train_data_dir:
  desc: null
  value: null
image_column:
  desc: null
  value: image
caption_column:
  desc: null
  value: text
validation_prompt:
  desc: null
  value: a blue skin woman, commander, red uniform, stld aesthetic
validation_prompt_neg:
  desc: null
  value: null
num_validation_images:
  desc: null
  value: 2
validation_epochs:
  desc: null
  value: 20
max_train_samples:
  desc: null
  value: null
output_dir:
  desc: null
  value: ../sdxl-lora-lower-decks-aesthetic
cache_dir:
  desc: null
  value: null
seed:
  desc: null
  value: 167813
resolution:
  desc: null
  value: 1024
center_crop:
  desc: null
  value: false
random_flip:
  desc: null
  value: false
train_text_encoder:
  desc: null
  value: true
train_batch_size:
  desc: null
  value: 1
num_train_epochs:
  desc: null
  value: 334
max_train_steps:
  desc: null
  value: 3000
checkpointing_steps:
  desc: null
  value: 500
checkpoints_total_limit:
  desc: null
  value: 10
resume_from_checkpoint:
  desc: null
  value: latest
gradient_accumulation_steps:
  desc: null
  value: 4
gradient_checkpointing:
  desc: null
  value: false
learning_rate:
  desc: null
  value: 4.0e-05
scale_lr:
  desc: null
  value: false
lr_scheduler:
  desc: null
  value: constant
lr_warmup_steps:
  desc: null
  value: 0
snr_gamma:
  desc: null
  value: null
allow_tf32:
  desc: null
  value: false
dataloader_num_workers:
  desc: null
  value: 0
use_8bit_adam:
  desc: null
  value: false
adam_beta1:
  desc: null
  value: 0.9
adam_beta2:
  desc: null
  value: 0.999
adam_weight_decay:
  desc: null
  value: 0.01
adam_epsilon:
  desc: null
  value: 1.0e-08
max_grad_norm:
  desc: null
  value: 1.0
push_to_hub:
  desc: null
  value: false
hub_token:
  desc: null
  value: null
prediction_type:
  desc: null
  value: null
hub_model_id:
  desc: null
  value: ra100/sdxl-lora-lower-decks-aesthetic
logging_dir:
  desc: null
  value: ./logs
report_to:
  desc: null
  value: wandb
mixed_precision:
  desc: null
  value: fp16
prior_generation_precision:
  desc: null
  value: null
local_rank:
  desc: null
  value: -1
enable_xformers_memory_efficient_attention:
  desc: null
  value: false
noise_offset:
  desc: null
  value: 0
rank:
  desc: null
  value: 4
```

`requirements.txt`

```txt
absl-py==1.4.0
accelerate==0.21.0
aiohttp==3.8.5
aiosignal==1.3.1
appdirs==1.4.4
async-timeout==4.0.2
attrs==23.1.0
bitsandbytes==0.41.1
black==23.7.0
cachetools==5.3.1
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.6
cmake==3.27.0
commentjson==0.9.0
datasets==2.14.3
diffusers==0.20.0.dev0
dill==0.3.7
docker-pycreds==0.4.0
filelock==3.12.2
frozenlist==1.4.0
fsspec==2023.6.0
ftfy==6.1.1
gitdb==4.0.10
gitpython==3.1.32
google-auth-oauthlib==1.0.0
google-auth==2.22.0
grpcio==1.56.2
huggingface-hub==0.16.4
idna==3.4
importlib-metadata==6.8.0
jinja2==3.1.2
lark-parser==0.7.8
lit==16.0.6
markdown==3.4.4
markupsafe==2.1.3
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.1
numpy==1.25.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib==3.2.2
packaging==23.1
pandas==2.0.3
pathspec==0.11.2
pathtools==0.1.2
pillow==10.0.0
pip==23.2.1
platformdirs==3.10.0
protobuf==4.23.4
psutil==5.9.5
pyarrow==12.0.1
pyasn1-modules==0.3.0
pyasn1==0.5.0
pyquaternion==0.9.9
pyre-extensions==0.0.29
python-dateutil==2.8.2
pytz==2023.3
pyyaml==6.0.1
regex==2023.6.3
requests-oauthlib==1.3.1
requests==2.31.0
rsa==4.9
ruff==0.0.283
safetensors==0.3.1
scipy==1.11.1
sentry-sdk==1.29.2
setproctitle==1.3.2
setuptools==68.0.0
six==1.16.0
smmap==5.0.0
sympy==1.12
tensorboard-data-server==0.7.1
tensorboard==2.13.0
tokenizers==0.13.3
torch==2.0.1+cu118
torchaudio==2.0.2+cu118
torchvision==0.15.2+cu118
tqdm==4.65.0
transformers==4.31.0
triton==2.0.0
typing-extensions==4.7.1
typing-inspect==0.9.0
tzdata==2023.3
urllib3==1.26.16
wandb==0.15.8
wcwidth==0.2.6
werkzeug==2.3.6
wheel==0.38.4
xformers==0.0.20
xxhash==3.3.0
yarl==1.9.2
zipp==3.16.2
```

`wandb-metadata.json`

```json
{
    "os": "Linux-6.4.2-060402-generic-x86_64-with-glibc2.37",
    "python": "3.11.4",
    "heartbeatAt": "2023-08-14T17:02:53.554067",
    "startedAt": "2023-08-14T17:02:53.002378",
    "docker": null,
    "cuda": null,
    "args": [
        "--checkpointing_steps=500",
        "--checkpoints_total_limit=10",
        "--gradient_accumulation_steps=4",
        "--learning_rate=4e-5",
        "--lr_scheduler=constant",
        "--lr_warmup_steps=0",
        "--max_train_steps=3000",
        "--mixed_precision=fp16",
        "--num_validation_images=2",
        "--output_dir=../sdxl-lora-lower-decks-aesthetic",
        "--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0",
        "--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix",
        "--hub_model_id=ra100/sdxl-lora-lower-decks-aesthetic",
        "--report_to=wandb",
        "--resolution=1024",
        "--seed=167813",
        "--train_batch_size=1",
        "--train_text_encoder",
        "--validation_epochs=20",
        "--resume_from_checkpoint=latest",
        "--logging_dir=./logs",
        "--dataset_name=ra100/lower-decks",
        "--validation_prompt=a blue skin woman, commander, red uniform, stld aesthetic"
    ],
    "state": "running",
    "program": "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py",
    "codePath": "diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py",
    "host": "hue",
    "username": "ra100",
    "executable": "/home/ra100/miniconda3/envs/dreambooth/bin/python",
    "cpu_count": 16,
    "cpu_count_logical": 32,
    "cpu_freq": {
        "current": 4901.787343749999,
        "min": 4500.0,
        "max": 4500.0
    },
    "cpu_freq_per_core": [
        {
            "current": 4500.0,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5451.316,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5474.703,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 3836.198,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 3125.771,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4500.0,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5470.599,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5412.791,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4835.952,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5193.109,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5030.967,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5135.753,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5288.625,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4828.394,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4288.455,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5293.719,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5483.718,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5477.5,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4500.0,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4500.0,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 3122.807,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5362.852,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5470.593,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5325.521,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5017.004,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5292.786,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4915.923,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4871.15,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5293.693,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4678.008,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 4587.239,
            "min": 4500.0,
            "max": 4500.0
        },
        {
            "current": 5292.049,
            "min": 4500.0,
            "max": 4500.0
        }
    ],
    "disk": {
        "total": 937.135196685791,
        "used": 123.2957878112793
    },
    "gpu": "NVIDIA GeForce RTX 4090",
    "gpu_count": 1,
    "gpu_devices": [
        {
            "name": "NVIDIA GeForce RTX 4090",
            "memory_total": 25757220864
        }
    ],
    "memory": {
        "total": 61.946109771728516
    }
}
```

### System Info

- `diffusers` version: 0.20.0.dev0
- Platform: Linux-6.4.2-060402-generic-x86_64-with-glibc2.37
- Python version: 3.11.4
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: 0.0.20
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

### Who can help?

@saya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

train_text_to_image_lora_sdxl.py throws error on `--resume_from_checkpoint` #4584

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

train_text_to_image_lora_sdxl.py throws error on --resume_from_checkpoint #4584

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

train_text_to_image_lora_sdxl.py throws error on `--resume_from_checkpoint` #4584