Flux LoRA multi-GPU training #997
Answered
by
bghira
PCanavelli
asked this question in
Q&A
-
Hey there, I am aware that it's currently impossible to train LoRAs with DeepSpeed on (just realised: is it also true for LyCORIS and other adapters?). Is it also true without DeepSpeed? I have been trying to set up Accelerate to use all my GPUs. The training starts fine, but it looks like only GPU 0 is being used while all others sit idle (0% load + same iteration speed as with single GPU). Accelerate configcompute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
enable_cpu_affinity: false
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false config.json{
"--resume_from_checkpoint": "latest",
"--data_backend_config": "config/multidatabackend.json",
"--aspect_bucket_rounding": 2,
"--seed": 42,
"--minimum_image_size": 0,
"--disable_benchmark": false,
"--output_dir": "output/models",
"--lora_type": "standard",
"--lora_rank": 64,
"--max_train_steps": 100000,
"--num_train_epochs": 0,
"--checkpointing_steps": 5000,
"--checkpoints_total_limit": 100,
"--model_type": "lora",
"--pretrained_model_name_or_path": "/workspace/checkpoints/my_checkpoint",
"--model_family": "flux",
"--train_batch_size": 1,
"--gradient_checkpointing": "true",
"--caption_dropout_probability": 0.0,
"--resolution_type": "pixel_area",
"--resolution": 1024,
"--validation_seed": 42,
"--validation_steps": 5000,
"--validation_resolution": "1024x1024",
"--validation_guidance": 3.0,
"--validation_guidance_rescale": "0.0",
"--validation_num_inference_steps": "20",
"--validation_prompt": "my test prompt",
"--mixed_precision": "bf16",
"--optimizer": "adamw_bf16",
"--learning_rate": "1e-4",
"--lr_scheduler": "polynomial",
"--lr_warmup_steps": 100,
"--validation_torch_compile": "false"
} multidatabackend.json[
{
"id": "my_training",
"type": "local",
"crop": "true",
"crop_aspect": "square",
"crop_style": "center",
"resolution": 1.0,
"minimum_image_size": 0.25,
"maximum_image_size": 1.0,
"target_downsample_size": 1.0,
"resolution_type": "area",
"cache_dir_vae": "cache/vae/my_training",
"instance_data_dir": "/workspace/datasets/my_training",
"disabled": false,
"skip_file_discovery": "",
"caption_strategy": "textfile",
"metadata_backend": "json"
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/my_training",
"disabled": false,
"write_batch_size": 128
}
] nvidia-smi mid-training
|
Beta Was this translation helpful? Give feedback.
Answered by
bghira
Sep 27, 2024
Replies: 1 comment 9 replies
-
check the first few lines of output. |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
bingo! the contents of accelerate's config file are only used for DeepSpeed details, so that we don't have to expect users to constantly change that obscurely-located file for stuff like num GPUs.