-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you give me some guidance on dreambooth train? #22
Comments
Hi, thanks for your interests. I used A6000 and A40. The gpu memory used for dreambooth is around 30GB as mentioned in #15 (comment) . For distributed training in your case, you need to separate the model. Sadly, that I don't have a clear idea have to do it with accelerate as I didn't try model parallelism myself before. |
Thank you very much for your answer. Maybe I can try changing the resolution of the image to reduce the GPU memory usage. |
First of all, I would like to ask about the GPU memory that needs to be used during the dreambooth training fine-tuning process, and if possible, can you tell me the machine you trained on?
At the same time, when training on a single card 4090, the error message is insufficient video memory.Therefore, I wanted to perform distributed training and run two 4090 graphics cards at the same time, but when I ran the command, I found that only my 0th card was still running in the background.
CUDA_VISIBLE_DEVICES=1,0 python dreambooth.py dreambooth.text_inversion_path=./example_output/real_image_editing/text_inversion/learned_embeds_iteration_500.bin
I tried to use accelerate for configuration training. After I configured accelerate config, the 0th and 1st cards were successfully called. However, I found that accelerate is an acceleration process, and the GPU memory will also be insufficient.
If possible, I would like to ask how to do correct distributed training. thank you.
The text was updated successfully, but these errors were encountered: