-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train on a single GTX1080TI(11G),Get CUDA out of memory error #8
Comments
It depends on your batch size and image size. As for our training, we use 512x512 images with a batch size of 10 for each 2080Ti (11G) GPU. |
Use two 1080TI(11G)GPU also get this out of memory error.here is error info:
What should i do now.So confused:(@Yaoyi-Li |
Can you train the model with a batch size of 9 or 8? Or could you provide some |
Use batch_size=8,get below out of memory error:
When the error above appears:
What's wrong with this,please get me out:(@Yaoyi-Li |
I have no idea what's happening here, but it looks like you are using only one GPU. |
Find the problem,when i train the model,there will be test procedure to determin when to save to model.But if valid image size is too large,will get oom error.Can you fix this bug?@Yaoyi-Li |
hi,it's me again.Now i finished training with your code.But the model i trained is really worse.I wonder that my datasets is not good.I don't know how to prepare the datasets without template.I asked the DIM author to give me their dataset but they refused.Can you please email me some demo images from DIM include training and testing to help me debug the code.Just a few images is enough:)@Yaoyi-Li |
I'm sorry, but I can't. According to Adobe's license, I'm not at liberty to distribute images in this dataset to anyone else. If they refused to give you the dataset, I think it means you are trying to use them for commercial purposes. I am sorry about it. |
Hi! I got same problem as you. Can you share how to deal with oom? Did you train on 2 GPUs? |
Hi, could you please provide your PyTorch and CUDA version. I found there are some other people also facing this problem. But I have no idea what happened. |
in your trainer code,at line 291.test part.This code will lead to oom: |
Thanks for your reply! |
Thanks for your advice. But I wonder that how many GPUs did you use? I didn't got the same problem when I use only one GPU.. |
Just uncomment the test procedure in the trianer.py,You will get things ahead.Resize your test image smaller will get test procedure works fine.By the way,what datasets you are using?The DIM or the datesets you made on yourself? |
I use the DIM datasets( I'm a student). Maybe we have different dataset. |
Hi, I have tried to train the model with CUDA9.0, pytorch1.1.0 on 2 1080tis. I think the version of Cuda doesn't matter. Multi-GPU training won't require much more memory than a single GPU. Did you try to train the model with a smaller batch size like 9 or 8? |
How much GPU memory do we need to train the model@Yaoyi-Li
The text was updated successfully, but these errors were encountered: