-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] fix a bug that may cause compilation failure of dynamic voxelization when using GPUs with compute capability lower than 6.x #326
Conversation
…when using gpus with compute capability lower than 6.x fix imperfection kernel code that may unintentionally discard valid points when input points count is larger than 50000 * 512 (nearly impossible though).
Codecov Report
@@ Coverage Diff @@
## master #326 +/- ##
==========================================
- Coverage 49.70% 49.69% -0.01%
==========================================
Files 174 174
Lines 11754 11758 +4
Branches 1838 1838
==========================================
+ Hits 5842 5843 +1
- Misses 5552 5555 +3
Partials 360 360
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
We just found that the CUDA kernel cannot successfully be compiled with CUDA 9.0. Does this PR fix this issue? |
Hi, @ZwwWayne |
We also test the code on CUDA 10.1 environments and it seems to be OK. However, MMDetection3d still needs compatibility with CUDA 9.0 due to some reasons. So it would be nice if you could help and also fix that issue in this PR. The screenshots of the error is listed below, @Tai-Wang may provide a more detailed log if necessary. Thanks in advance. |
It is really weird that from the log it seems all the errors occured when instantiating templates in PyTorch headers. |
Hi, @Tai-Wang |
@zhanggefan Yes, exactly. We use the pytorch 1.5 built from source. The compilation error does not exist in the previous released version and it seems to point to the "scatter" cuda file related to your PR. |
LGTM. The failure might need further exploration. |
@Tai-Wang It is really interesting that when compiling with pytorch1.5 and cuda9.0, NVCC cannot compile any code successfully if it includes "torch/extension.h". I am still trying to find out the root cause. But the workaround code will be ready soon. |
… PyTorch1.5 on CUDA9.0
OK. When your workaround is ready, I can help check it together. |
This version can be compiled successfully with PyTorch1.5 on CUDA9.0, and pass the unit test on CUDA10.2. But I have not been able to test its functionality with CUDA9.0. Installing all the dependencies is a nightmare for me... |
How do you test the functionality with CUDA10? Just run some experiments using dynamic scatter? Could you please show an example or give me a standard benchmark, or could you please add some simple tests to validate it on CUDA9 devices? |
The pytest script is here: |
Please let me know if the test fails. I am not experienced with the torch extensions before CUDA 10 so I could not guarantee that code works as expected. For CUDA versions>=10 the torch/extension.h is the all-in-one header. But for CUDA9 I have to turn to those backend aten headers that I am not familiar with. |
The compilation and installation is fine, but there is a RuntimeError for the gradcheck in the unit test. Here is the error message for your reference:
|
Looks like you switched to python 3.8 this time. What is the PyTorch and CUDA version this time? |
I think it should be able to be compatible both with python 3.7 and 3.8, which is not a strong constraint for environment. Except python version, we use the same version of CUDA and pytorch built from source. |
Just validate it with python 3.7 and get the same error message. |
…rk non-floating-point tensor as non-differentiable.
Error reproduced. The issue closely relates to the following discussion as well as an issue and a PR to PyTorch: Before this PR the Pytorch's auto-grad engine marks integer output as requires grad by default (for example marking indices tensors as require grad and for most cases, this does not make sense). This PR deals with it and has been merge into the master branch before Pytorch 1.6, so this issue cannot be reproduced with versions later than 1.6. Explicitly marking voxel_coors as non-differentiable could solve this issue. |
@Tai-Wang
The last commit passed the gradcheck without error. |
@zhanggefan Haha it is just due to many GPUs available for us are 1080Ti, so we are only able to use CUDA9 and pytorch built from source. Thanks for your contribution. It also works on my side and looks good to me. |
* start up * zh-cn v0.1 * [Docs] Add a from-scratch example for "Get Started" (open-mmlab#326) * Add a from-scratch example * Fix typo * resolve comment * bachslash * Resolve comments * Refine commands * add cn docs * Correct commands * fixing... * update zn-cn docs * update en link * add sdk's get-started (open-mmlab#331) * add sdk's get-started * add SDK build command * fix chinglish * add sdk get start zh_cn * update zh_cn cite * fix command * add selfsup/razor readme * Fix command Co-authored-by: Yifan Zhou <singlezombie@163.com> Co-authored-by: lvhan028 <lvhan_028@163.com>
fix a bug that may cause compilation failure of dynamic voxelization when using GPUs with compute capability lower than 6.x
fix imperfection kernel code that may unintentionally discard valid points when input points count is larger than 50000 * 512 (nearly impossible though).