Closed
Description
Describe the bug
When trying to resume training from checkpoint using train_text_to_image_lora_sdxl.py
and --resume_from_checkpoint=latest
, I'm getting error (logs below).
Reproduction
Command to run:
accelerate launch diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py \
--output_dir="../sdxl-lora-lower-decks-aesthetic" \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
--hub_model_id="ra100/sdxl-lora-lower-decks-aesthetic" \
--dataset_name="ra100/lower-decks" \
--checkpointing_steps=500 \
--checkpoints_total_limit=10 \
--gradient_accumulation_steps=4 \
--learning_rate=4e-5 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=3000 \
--mixed_precision="fp16" \
--num_validation_images=2 \
--report_to="wandb" \
--resolution=1024 \
--seed="167813" \
--train_batch_size=1 \
--train_text_encoder \
--validation_epochs=20 \
--resume_from_checkpoint=latest \
--logging_dir=./logs \
--validation_prompt="a blue skin woman, commander, red uniform, stld aesthetic"
Hub model: https://huggingface.co/ra100/sdxl-lora-lower-decks-aesthetic
Hub dataset: https://huggingface.co/datasets/ra100/lower-decks
Logs
08/14/2023 19:02:54 - INFO - __main__ - ***** Running training *****
08/14/2023 19:02:54 - INFO - __main__ - Num examples = 35
08/14/2023 19:02:54 - INFO - __main__ - Num Epochs = 334
08/14/2023 19:02:54 - INFO - __main__ - Instantaneous batch size per device = 1
08/14/2023 19:02:54 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
08/14/2023 19:02:54 - INFO - __main__ - Gradient Accumulation steps = 4
08/14/2023 19:02:54 - INFO - __main__ - Total optimization steps = 3000
08/14/2023 19:02:54 - INFO - accelerate.accelerator - Loading states from ../sdxl-lora-lower-decks-aesthetic/checkpoint-2500
Loading unet.
Loading text_encoder.
Loading text_encoder.
Traceback (most recent call last):
File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1294, in <module>
main(args)
File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 986, in main
accelerator.load_state(os.path.join(args.output_dir, path))
File "/home/ra100/miniconda3/envs/dreambooth/lib/python3.11/site-packages/accelerate/accelerator.py", line 2695, in load_state
hook(models, input_dir)
File "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 736, in load_model_hook
LoraLoaderMixin.load_lora_into_text_encoder(
File "/media/quick/ai/dreambooth/diffusers/src/diffusers/loaders.py", line 1309, in load_lora_into_text_encoder
load_state_dict_results = text_encoder.load_state_dict(text_encoder_lora_state_dict, strict=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ra100/miniconda3/envs/dreambooth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModelWithProjection:
size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_linear_layer.down.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([4, 1280]).
size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_linear_layer.up.weight: copying a param with shape torch.Size([768, 4]) from checkpoint, the shape in current model is torch.Size([1280, 4]).
Resuming from checkpoint checkpoint-2500
wandb logs:
debug.log
debug-internal.log
output.log
conda-environment.yaml
name: dreambooth
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.05.30=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.9=h7f8727e_0
- pip=23.2.1=py311h06a4308_0
- python=3.11.4=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.38.4=py311h06a4308_0
- xz=5.4.2=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- absl-py==1.4.0
- accelerate==0.21.0
- aiohttp==3.8.5
- aiosignal==1.3.1
- appdirs==1.4.4
- async-timeout==4.0.2
- attrs==23.1.0
- bitsandbytes==0.41.1
- black==23.7.0
- cachetools==5.3.1
- certifi==2023.7.22
- charset-normalizer==3.2.0
- click==8.1.6
- cmake==3.27.0
- datasets==2.14.3
- diffusers==0.20.0.dev0
- dill==0.3.7
- docker-pycreds==0.4.0
- filelock==3.12.2
- frozenlist==1.4.0
- fsspec==2023.6.0
- ftfy==6.1.1
- gitdb==4.0.10
- gitpython==3.1.32
- google-auth==2.22.0
- google-auth-oauthlib==1.0.0
- grpcio==1.56.2
- huggingface-hub==0.16.4
- idna==3.4
- importlib-metadata==6.8.0
- jinja2==3.1.2
- lit==16.0.6
- markdown==3.4.4
- markupsafe==2.1.3
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- mypy-extensions==1.0.0
- networkx==3.1
- numpy==1.25.2
- nvidia-cublas-cu11==11.10.3.66
- nvidia-cuda-cupti-cu11==11.7.101
- nvidia-cuda-nvrtc-cu11==11.7.99
- nvidia-cuda-runtime-cu11==11.7.99
- nvidia-cudnn-cu11==8.5.0.96
- nvidia-cufft-cu11==10.9.0.58
- nvidia-curand-cu11==10.2.10.91
- nvidia-cusolver-cu11==11.4.0.1
- nvidia-cusparse-cu11==11.7.4.91
- nvidia-nccl-cu11==2.14.3
- nvidia-nvtx-cu11==11.7.91
- oauthlib==3.2.2
- packaging==23.1
- pandas==2.0.3
- pathspec==0.11.2
- pathtools==0.1.2
- pillow==10.0.0
- platformdirs==3.10.0
- protobuf==4.23.4
- psutil==5.9.5
- pyarrow==12.0.1
- pyasn1==0.5.0
- pyasn1-modules==0.3.0
- pyre-extensions==0.0.29
- python-dateutil==2.8.2
- pytz==2023.3
- pyyaml==6.0.1
- regex==2023.6.3
- requests==2.31.0
- requests-oauthlib==1.3.1
- rsa==4.9
- ruff==0.0.283
- safetensors==0.3.1
- scipy==1.11.1
- sentry-sdk==1.29.2
- setproctitle==1.3.2
- six==1.16.0
- smmap==5.0.0
- sympy==1.12
- tensorboard==2.13.0
- tensorboard-data-server==0.7.1
- tokenizers==0.13.3
- torch==2.0.1+cu118
- torchaudio==2.0.2+cu118
- torchvision==0.15.2+cu118
- tqdm==4.65.0
- transformers==4.31.0
- triton==2.0.0
- typing-extensions==4.7.1
- typing-inspect==0.9.0
- tzdata==2023.3
- urllib3==1.26.16
- wandb==0.15.8
- wcwidth==0.2.6
- werkzeug==2.3.6
- xformers==0.0.20
- xxhash==3.3.0
- yarl==1.9.2
- zipp==3.16.2
prefix: /home/ra100/miniconda3/envs/dreambooth
config.yaml
wandb_version: 1
_wandb:
desc: null
value:
python_version: 3.11.4
cli_version: 0.15.8
framework: huggingface
huggingface_version: 4.31.0
is_jupyter_run: false
is_kaggle_kernel: false
start_time: 1692032573.004215
t:
1:
- 1
- 11
- 41
- 49
- 51
- 55
- 71
- 83
2:
- 1
- 11
- 41
- 49
- 51
- 55
- 71
- 83
3:
- 23
4: 3.11.4
5: 0.15.8
6: 4.31.0
8:
- 5
pretrained_model_name_or_path:
desc: null
value: stabilityai/stable-diffusion-xl-base-1.0
pretrained_vae_model_name_or_path:
desc: null
value: madebyollin/sdxl-vae-fp16-fix
revision:
desc: null
value: null
dataset_name:
desc: null
value: ra100/lower-decks
dataset_config_name:
desc: null
value: null
train_data_dir:
desc: null
value: null
image_column:
desc: null
value: image
caption_column:
desc: null
value: text
validation_prompt:
desc: null
value: a blue skin woman, commander, red uniform, stld aesthetic
validation_prompt_neg:
desc: null
value: null
num_validation_images:
desc: null
value: 2
validation_epochs:
desc: null
value: 20
max_train_samples:
desc: null
value: null
output_dir:
desc: null
value: ../sdxl-lora-lower-decks-aesthetic
cache_dir:
desc: null
value: null
seed:
desc: null
value: 167813
resolution:
desc: null
value: 1024
center_crop:
desc: null
value: false
random_flip:
desc: null
value: false
train_text_encoder:
desc: null
value: true
train_batch_size:
desc: null
value: 1
num_train_epochs:
desc: null
value: 334
max_train_steps:
desc: null
value: 3000
checkpointing_steps:
desc: null
value: 500
checkpoints_total_limit:
desc: null
value: 10
resume_from_checkpoint:
desc: null
value: latest
gradient_accumulation_steps:
desc: null
value: 4
gradient_checkpointing:
desc: null
value: false
learning_rate:
desc: null
value: 4.0e-05
scale_lr:
desc: null
value: false
lr_scheduler:
desc: null
value: constant
lr_warmup_steps:
desc: null
value: 0
snr_gamma:
desc: null
value: null
allow_tf32:
desc: null
value: false
dataloader_num_workers:
desc: null
value: 0
use_8bit_adam:
desc: null
value: false
adam_beta1:
desc: null
value: 0.9
adam_beta2:
desc: null
value: 0.999
adam_weight_decay:
desc: null
value: 0.01
adam_epsilon:
desc: null
value: 1.0e-08
max_grad_norm:
desc: null
value: 1.0
push_to_hub:
desc: null
value: false
hub_token:
desc: null
value: null
prediction_type:
desc: null
value: null
hub_model_id:
desc: null
value: ra100/sdxl-lora-lower-decks-aesthetic
logging_dir:
desc: null
value: ./logs
report_to:
desc: null
value: wandb
mixed_precision:
desc: null
value: fp16
prior_generation_precision:
desc: null
value: null
local_rank:
desc: null
value: -1
enable_xformers_memory_efficient_attention:
desc: null
value: false
noise_offset:
desc: null
value: 0
rank:
desc: null
value: 4
requirements.txt
absl-py==1.4.0
accelerate==0.21.0
aiohttp==3.8.5
aiosignal==1.3.1
appdirs==1.4.4
async-timeout==4.0.2
attrs==23.1.0
bitsandbytes==0.41.1
black==23.7.0
cachetools==5.3.1
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.6
cmake==3.27.0
commentjson==0.9.0
datasets==2.14.3
diffusers==0.20.0.dev0
dill==0.3.7
docker-pycreds==0.4.0
filelock==3.12.2
frozenlist==1.4.0
fsspec==2023.6.0
ftfy==6.1.1
gitdb==4.0.10
gitpython==3.1.32
google-auth-oauthlib==1.0.0
google-auth==2.22.0
grpcio==1.56.2
huggingface-hub==0.16.4
idna==3.4
importlib-metadata==6.8.0
jinja2==3.1.2
lark-parser==0.7.8
lit==16.0.6
markdown==3.4.4
markupsafe==2.1.3
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.1
numpy==1.25.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib==3.2.2
packaging==23.1
pandas==2.0.3
pathspec==0.11.2
pathtools==0.1.2
pillow==10.0.0
pip==23.2.1
platformdirs==3.10.0
protobuf==4.23.4
psutil==5.9.5
pyarrow==12.0.1
pyasn1-modules==0.3.0
pyasn1==0.5.0
pyquaternion==0.9.9
pyre-extensions==0.0.29
python-dateutil==2.8.2
pytz==2023.3
pyyaml==6.0.1
regex==2023.6.3
requests-oauthlib==1.3.1
requests==2.31.0
rsa==4.9
ruff==0.0.283
safetensors==0.3.1
scipy==1.11.1
sentry-sdk==1.29.2
setproctitle==1.3.2
setuptools==68.0.0
six==1.16.0
smmap==5.0.0
sympy==1.12
tensorboard-data-server==0.7.1
tensorboard==2.13.0
tokenizers==0.13.3
torch==2.0.1+cu118
torchaudio==2.0.2+cu118
torchvision==0.15.2+cu118
tqdm==4.65.0
transformers==4.31.0
triton==2.0.0
typing-extensions==4.7.1
typing-inspect==0.9.0
tzdata==2023.3
urllib3==1.26.16
wandb==0.15.8
wcwidth==0.2.6
werkzeug==2.3.6
wheel==0.38.4
xformers==0.0.20
xxhash==3.3.0
yarl==1.9.2
zipp==3.16.2
wandb-metadata.json
{
"os": "Linux-6.4.2-060402-generic-x86_64-with-glibc2.37",
"python": "3.11.4",
"heartbeatAt": "2023-08-14T17:02:53.554067",
"startedAt": "2023-08-14T17:02:53.002378",
"docker": null,
"cuda": null,
"args": [
"--checkpointing_steps=500",
"--checkpoints_total_limit=10",
"--gradient_accumulation_steps=4",
"--learning_rate=4e-5",
"--lr_scheduler=constant",
"--lr_warmup_steps=0",
"--max_train_steps=3000",
"--mixed_precision=fp16",
"--num_validation_images=2",
"--output_dir=../sdxl-lora-lower-decks-aesthetic",
"--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0",
"--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix",
"--hub_model_id=ra100/sdxl-lora-lower-decks-aesthetic",
"--report_to=wandb",
"--resolution=1024",
"--seed=167813",
"--train_batch_size=1",
"--train_text_encoder",
"--validation_epochs=20",
"--resume_from_checkpoint=latest",
"--logging_dir=./logs",
"--dataset_name=ra100/lower-decks",
"--validation_prompt=a blue skin woman, commander, red uniform, stld aesthetic"
],
"state": "running",
"program": "/media/quick/ai/dreambooth/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py",
"codePath": "diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py",
"host": "hue",
"username": "ra100",
"executable": "/home/ra100/miniconda3/envs/dreambooth/bin/python",
"cpu_count": 16,
"cpu_count_logical": 32,
"cpu_freq": {
"current": 4901.787343749999,
"min": 4500.0,
"max": 4500.0
},
"cpu_freq_per_core": [
{
"current": 4500.0,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5451.316,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5474.703,
"min": 4500.0,
"max": 4500.0
},
{
"current": 3836.198,
"min": 4500.0,
"max": 4500.0
},
{
"current": 3125.771,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4500.0,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5470.599,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5412.791,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4835.952,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5193.109,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5030.967,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5135.753,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5288.625,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4828.394,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4288.455,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5293.719,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5483.718,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5477.5,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4500.0,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4500.0,
"min": 4500.0,
"max": 4500.0
},
{
"current": 3122.807,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5362.852,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5470.593,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5325.521,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5017.004,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5292.786,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4915.923,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4871.15,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5293.693,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4678.008,
"min": 4500.0,
"max": 4500.0
},
{
"current": 4587.239,
"min": 4500.0,
"max": 4500.0
},
{
"current": 5292.049,
"min": 4500.0,
"max": 4500.0
}
],
"disk": {
"total": 937.135196685791,
"used": 123.2957878112793
},
"gpu": "NVIDIA GeForce RTX 4090",
"gpu_count": 1,
"gpu_devices": [
{
"name": "NVIDIA GeForce RTX 4090",
"memory_total": 25757220864
}
],
"memory": {
"total": 61.946109771728516
}
}
System Info
diffusers
version: 0.20.0.dev0- Platform: Linux-6.4.2-060402-generic-x86_64-with-glibc2.37
- Python version: 3.11.4
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: 0.0.20
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no