Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Examples] fix checkpointing and casting bugs in train_text_to_image_lora_sdxl.py #4632

Merged
merged 5 commits into from
Aug 23, 2023

Conversation

sayakpaul
Copy link
Member

Fixes #4566

Fixes #4619

The checkpointing bug is also evident in #4566. Once this PR is merged will reflect that changes in the LoRA DreamBooth SDXL script too.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 16, 2023

The documentation is not available anymore as the PR was closed or merged.

@xiankgx
Copy link

xiankgx commented Aug 16, 2023

@sayakpaul

  1. This keyword parameter is unused.

    parser.add_argument(
    "--prior_generation_precision",
    type=str,
    default=None,
    choices=["no", "fp32", "fp16", "bf16"],
    help=(
    "Choose prior generation precision between fp32, fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
    " 1.10.and an Nvidia Ampere GPU. Default to fp16 if a GPU is available else fp32."
    ),
    )

  2. Text encoder one and two are conditionally accelerate prepared, but then are accelerate unwrapped no matter if they are accelerate prepared or not. Not sure if there is any issue for this.

    # Prepare everything with our `accelerator`.
    if args.train_text_encoder:
    unet, text_encoder_one, text_encoder_two, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
    unet, text_encoder_one, text_encoder_two, optimizer, train_dataloader, lr_scheduler
    )
    else:
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
    unet, optimizer, train_dataloader, lr_scheduler
    )

    text_encoder=accelerator.unwrap_model(text_encoder_one),
    text_encoder_2=accelerator.unwrap_model(text_encoder_two),

  3. We accumulate the gradients with accelerate.accumulate(unet). If we enable training the text encoder, is this still valid?

    with accelerator.accumulate(unet):

  4. This recreating of text encoders seemed unnecessary.

    # create pipeline
    if not args.train_text_encoder:
    text_encoder_one = text_encoder_cls_one.from_pretrained(
    args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision
    )
    text_encoder_two = text_encoder_cls_two.from_pretrained(
    args.pretrained_model_name_or_path, subfolder="text_encoder_2", revision=args.revision
    )

  5. If we flip an image, do we take coordinates of the flipped top-left (now top right), or do we take the top-left (previously top right)
    https://github.com/huggingface/diffusers/blob/dc898919e1c10ee1c8348261860dcd41d56e040b/examples/text_to_image/train_text_to_image_lora_sdxl.py#L872C2-L873v

@sayakpaul
Copy link
Member Author

sayakpaul commented Aug 17, 2023

Text encoder one and two are conditionally accelerate prepared, but then are accelerate unwrapped no matter if they are accelerate prepared or not. Not sure if there is any issue for this.

Unwrapping can be a no op in case where it's not supposed to take effect. So, shouldn't be a problem.

This recreating of text encoders seemed unnecessary.

Yes, you're totally right. Since, we're precomputing the text embeddings here. Will reflect this in the PR.

We accumulate the gradients with accelerate.accumulate(unet). If we enable training the text encoder, is this still valid?

Ccing @muellerzr here.

This keyword parameter is unused.

Will remove. Thanks for flagging!

If we flip an image, do we take coordinates of the flipped top-left (now top right), or do we take the top-left (previously top right)

I think the current implementation accounts for both if I am reading correctly. If you think otherwise, maybe provide a few visual examples? Also ccing @okotaku here.

Thanks so much for providing feedback!

@xiankgx
Copy link

xiankgx commented Aug 17, 2023

Text encoder one and two are conditionally accelerate prepared, but then are accelerate unwrapped no matter if they are accelerate prepared or not. Not sure if there is any issue for this.

Unwrapping can be a no op in case where it's not supposed to take effect. So, shouldn't be a problem.

This recreating of text encoders seemed unnecessary.

Yes, you're totally right. Since, we're precomputing the text embeddings here. Will reflect this in the PR.

We accumulate the gradients with accelerate.accumulate(unet). If we enable training the text encoder, is this still valid?

Ccing @muellerzr here.

This keyword parameter is unused.

Will remove. Thanks for flagging!

I think the current implementation accounts for both if I am reading correctly. If you think otherwise, maybe provide a few visual examples? Also ccing @okotaku here.

Thanks so much for providing feedback!

If we flip an image, do we take coordinates of the flipped top-left (now top right), or do we take the top-left (previously top right)
image

@sayakpaul
Copy link
Member Author

I think we should consider the original state here hence, x1 = image.width - x1.

@sayakpaul sayakpaul merged commit 4909b1e into main Aug 23, 2023
11 checks passed
@sayakpaul sayakpaul deleted the fix/sdxl-lora-training-script branch August 23, 2023 05:29
@stpic270
Copy link

Hii, I still get this error using my own data and dunno how to fix it yet
REPRODUCTION
! accelerate launch /content/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0"
--pretrained_vae_model_name_or_path='madebyollin/sdxl-vae-fp16-fix'
--train_data_dir="/content/EN_DATA"
--caption_column="text"
--resolution=512 --random_flip
--gradient_accumulation_steps=11
--train_batch_size=3
--num_train_epochs=1 --checkpointing_steps=10 --checkpoints_total_limit=3
--learning_rate=3e-04 --lr_scheduler="constant" --lr_warmup_steps=0
--mixed_precision="fp16"
--validation_prompt="Beutiful comet in a cartoon style"
--seed=42
--report_to="wandb"
--output_dir='/content/output_dir_5_epochs'
--resume_from_checkpoint 'latest'
--use_8bit_adam

ERROR
image

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…_lora_sdxl.py` (huggingface#4632)

* fix: casting issues.

* fix checkpointing.

* tests

* fix: bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants