Problems trying to train a lora on SD3.5-Large #1155

cthu1hoo · 2024-11-14T07:19:29Z

cthu1hoo
Nov 14, 2024

commit d3c8d7c9a9bc9e3faec5438520343f11a9d3d5ac (HEAD -> release, origin/release)
Merge: ac6efa7b cbebba44
Author: Bagheera <59658056+bghira@users.noreply.github.com>
Date:   Sun Nov 10 11:15:49 2024 -0600

    Merge pull request #1136 from bghira/main

    validation: disable compile for lycoris

I can't figure out the settings for SD 3.5-Large. ~60 image 1024x1024 square dataset, 16GB CUDA card, Fedora linux. Here's the configuration I'm currently using:

{
        "--resume_from_checkpoint": "latest",
        "--data_backend_config": "/home/fox/LoRA/Blahaj/multidatabackend.json",
        "--aspect_bucket_rounding": 2,
        "--seed": 42,
        "--minimum_image_size": 0,
        "--disable_benchmark": false,
        "--output_dir": "output/blahaj-sd35-v7",
        "--lora_type": "standard",
        "--lora_rank": 32,
        "--max_train_steps": 5000,
        "--num_train_epochs": 0,
        "--checkpointing_steps": 1000,
        "--checkpoints_total_limit": 5,
        "--model_type": "lora",
        "--pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5-large",
        "--model_family": "sd3",
        "--train_batch_size": 1,
        "--gradient_checkpointing": "true",
        "--caption_dropout_probability": 0.0,
        "--resolution_type": "pixel_area",
        "--resolution": 1024,
        "--validation_seed": 42,
        "--validation_steps": "500",
        "--validation_resolution": "1024x1024",
        "--validation_guidance": 5.0,
        "--validation_guidance_rescale": "0.0",
        "--validation_num_inference_steps": "20",
        "--validation_prompt": "blahaj toy plush shark",
        "--user_prompt_library": "/home/fox/LoRA/Blahaj/user_prompt_library.json",
        "--mixed_precision": "bf16",
        "--optimizer": "adamw_bf16",
        "--learning_rate": "1e-4",
        "--lr_scheduler": "polynomial",
        "--lr_warmup_steps": 100,
        "--base_model_default_dtype": "bf16",
        "--quantize_via": "cpu",
        "--max_grad_norm": 0.1,
        "--flux_schedule_shift": 1,
        "--base_model_precision": "int8-quanto",
        "--text_encoder_1_precision": "no_change",
        "--text_encoder_2_precision": "no_change",
        "--text_encoder_3_precision": "no_change",
        "--validation_torch_compile": "false"
}

I have to train on 512px because otherwise I'm getting an OOM while SimpleTuner is initially processing the dataset (I'm not sure if 1024px training is possible on 16GB, but would appreciate help here too).

Here's the dataset config:

[
    {
        "id": "blahaj-512",
        "type": "local",
        "instance_data_dir": "/home/fox/LoRA/Blahaj/dataset",
        "crop": false,
        "crop_style": "random",
        "minimum_image_size": 128,
        "resolution": 512,
        "resolution_type": "pixel_area",
        "repeats": 10,
        "metadata_backend": "discovery",
        "caption_strategy": "textfile",
        "cache_dir_vae": "cache//vae-512"
    },
   {
        "id": "text-embed-cache",
        "dataset_type": "text_embeds",
        "default": true,
        "type": "local",
        "cache_dir": "cache//text"
    }
]

Unless I use max_grad_norm 0.1, images quickly turn into noise, and resulting lora is unfunctional:

If I'm using max_grad_norm 0.1, images don't turn into noise (one exception being an unmprompted image) but the model seems to not be learning the concept, I've let it go to 3500 steps and the images were not really improving:

For SD 3.5, I've tried several optimizers and quantization settings to no avail.

To check if I'm doing something fundamentally wrong, I've used the same dataset to train a Flux-dev lora and it trained just fine, just in case here's the JSON I was using for Flux:

{
        "--resume_from_checkpoint": "latest",
        "--data_backend_config": "/home/fox/LoRA/Blahaj/multidatabackend.json",
        "--aspect_bucket_rounding": 2,
        "--seed": 42,
        "--minimum_image_size": 0,
        "--disable_benchmark": false,
        "--output_dir": "output/blahaj-flux-v7",
        "--lora_type": "standard",
        "--lora_rank": 32,
        "--max_train_steps": 5000,
        "--flux_lora_target": "mmdit",
        "--num_train_epochs": 0,
        "--checkpointing_steps": 1000,
        "--checkpoints_total_limit": 5,
        "--model_type": "lora",
        "--pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev",
        "--model_family": "flux",
        "--train_batch_size": 1,
        "--gradient_checkpointing": "true",
        "--caption_dropout_probability": 0.0,
        "--resolution_type": "pixel_area",
        "--resolution": 512,
        "--validation_seed": 42,
        "--validation_steps": "500",
        "--validation_resolution": "1024x1024",
        "--validation_guidance": 5.0,
        "--validation_guidance_rescale": "0.0",
        "--validation_num_inference_steps": "20",
        "--validation_prompt": "blahaj toy plush shark",
        "--user_prompt_library": "/home/fox/LoRA/Blahaj/user_prompt_library.json",
        "--mixed_precision": "bf16",
        "--optimizer": "optimi-stableadamw",
        "--learning_rate": "1e-4",
        "--lr_scheduler": "polynomial",
        "--lr_warmup_steps": 100,
        "--text_encoder_1_precision": "no_change",
        "--text_encoder_2_precision": "no_change",
        "--max_grad_norm": 1.0,
        "--base_model_default_dtype": "bf16",
        "--quantize_via": "cpu",
        "--base_model_precision": "int8-quanto",
        "--validation_torch_compile": "false"
}