-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory for CenterPoint even with batch size 1. #1398
Comments
I was able to run a few iterations using 1 batch size, 1 GPU. However, the loss seems unreasonably high:
I am not sure what the cause of this is - was CenterPoint affected by the v1 update? With regards to the GRAM issue, it was pointed out previously in #983 that the 4096 parameter in I also tried running the above setup with spconv v2. With it, I can comfortably fit 4 batch size on 10 GB (8 GB used). When using FP16, I am able to fit 8 batch size on 10 GB (9GB used). Are there plans to integrate spconv v2 into mmdetection3d? |
@Divadi Can you share the steps that you have managed to modify the mmdet3d ops and re-building it along with spconv v2? It will be a big deal for me |
Of course - it's a bit hacky and I have not confirmed it yet, but here it is. Installed spconv v2 via pip Changed files also includes a "overwrite_spconv" - just put this anywhere and import it in your train/test. This hackily removes the mmcv spconv modules and replaces them with the spconv versions so they can be used in the build-from-registry functions in mmdet3d. This basically doesn't use spconv in mmcv at all and instead uses spconv v2. |
Much appreciated @Divadi, I will give it a try. Many thanks for sharing the files 👍 |
Great solutions! We won't consider using v2 to directly replace v1 because we still would like to support cuda9, so a temporary solution is to sort out a relatively elegant way to use spconv v2.0 and we will add the solution into our documentation for reference. Many thanks for your sharing. |
If anyone from the community has better solutions, welcome to share yours in this issue. We will also have a try and update the documentation accordingly ASAP. |
BTW, @Divadi have you ever successfully trained this CenterPoint config before the release of mmdet3d v1.0.0rc? We need to check whether our refactoring does not affect this. |
I have not tried; I can try pulling an older version later and train. |
I have the development environment as below:
|
That's great - if possible, could you share if the performance is as you would expect? |
Well, to be honest, I couldn't notice any significant decrease in memory. But I think the performance is almost the same. It can be concluded with the help of using a tensorboard profiler I think so. I tried to use the profiler logger, but I think there is a bug, it's not logging the details. Also, I myself am not aware of the theoretical difference b/w spconv 1 and 2. |
@ammaryasirnaich That's strange; are you sure the spconv 2 version is the one being used? spconv 2, compared to the implementation in mmcv, should use substantially less memory & be noticeably faster for models with an spconv backbone. |
@Divadi , your impression about the wrong spconv version was right. It was my bad, typo mistake in the environmental variable that caused to skip the spconv 2 import. After correcting it have experience a significate decrease in memory allocation as you have mentioned. Many thanks for pointing it out |
Thanks for your discussion. We have decided to officially support the spconv 2.0 option with #1421 . Please stay tuned. |
I have tried to train CenterPoint with the default config, and reduced batch size when I ran out of CUDA memory. However, even with batch size 1 on 10GB memory, I am unable to fit into CUDA memory, which I think is strange.
Describe the bug
I encounter CUDA out of memory error when training CenterPoint on 3080 GPUs (10GB) even with batch size 1.
Reproduction
Did you make any modifications on the code or config? Did you understand what you have modified?
No changes.
What dataset did you use?
nuScenes, no changes.
Environment
python mmdet3d/utils/collect_env.py
to collect necessary environment information and paste it here.$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)Error traceback
If applicable, paste the error trackback here.
The text was updated successfully, but these errors were encountered: