Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OUT OF MEMORY #7

Open
BlackPuuuudding opened this issue Apr 26, 2024 · 1 comment
Open

OUT OF MEMORY #7

BlackPuuuudding opened this issue Apr 26, 2024 · 1 comment

Comments

@BlackPuuuudding
Copy link

Why does my training run out of memory when the batch size is set to 4, and out of memory when the batch size is set to 2 during multi-GPU training, yet the paper is able to set it to 8? I'm using the same device as the one mentioned in the paper, which is a 4090, and the cpkt is SD 1.4 and interact-diffusion-v1-1.pth.
Thank you!!

@jiuntian
Copy link
Owner

We use this command for training:

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt <existing_gligen_checkpoint> --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name <existing SD v1.4/v1.5 checkpoint>

We use AMP, batch size is set to 4 for each GPU.

Training detail at readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants