Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error in backward. #81

Open
HatsuneMiku888 opened this issue Aug 9, 2023 · 14 comments
Open

CUDA error in backward. #81

HatsuneMiku888 opened this issue Aug 9, 2023 · 14 comments

Comments

@HatsuneMiku888
Copy link

Hi,
I experienced RuntimeError: an illegal memory access was encountered when I train 3d gaussian on the T&T dataset. It seems to happen in backpropagation. Here is the input of the backward function.

And the error disappeared when I commented out https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503. I have no idea about why this line would cause illegal memory access.

@Snosixtyboo
Copy link
Collaborator

Hi,

commenting that line as you did will significantly change the math of the gradient computation and should give you very bad results. We are currently at Siggraph, but when we get back we will see what we can find from the .dump you shared.

@HatsuneMiku888
Copy link
Author

Thanks for your reply!
I know commenting that line can't be a final solution. It just to locate where the things going wrong. I mean the backpropagation can passed successfully under the same input without that line.

@ray8828
Copy link

ray8828 commented Aug 23, 2023

The same problem appears to me, there are 3 issues for the invalid memory now, and none of them can work out... could someone help? thanks!

@Snosixtyboo
Copy link
Collaborator

Snosixtyboo commented Aug 23, 2023

The same problem appears to me, there are 3 issues for the invalid memory now, and none of them can work out... could someone help? thanks!

Hi @ray8828 , if you have that issue can you post your hardware setup and the .dump for when the crash occurred? Creating the dump file requires running with --debug

@Snosixtyboo
Copy link
Collaborator

@HatsuneMiku888 I finally had the time to look at your output. It seems that you are using both Python-computed covariance matrices and colors (--convert_SHs_python and --convert_cov3D_python are active), any particular reason for this? We left those paths in for compatibility and experimenting, they are not heavily tested.

@Snosixtyboo
Copy link
Collaborator

@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:

image
For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?

Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you
https://github.com/camenduru/gaussian-splatting-colab

@rgxie
Copy link

rgxie commented Aug 29, 2023

I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503

@Snosixtyboo
Copy link
Collaborator

I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503

Hi,

please note that this is not a fix, it will completely break the math behind the approach. If you continue to have issues with running it, please consider using the Colab linked on the main page.

@HatsuneMiku888
Copy link
Author

@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:

image For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?

Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab

1073280485 is very close to 2^30, maybe there are some numeric overflow?

@Snosixtyboo
Copy link
Collaborator

@HatsuneMiku888 how good is your Python? Could you force it to create the snapshow_fw.dump of the forward pass (even tho it doesn't fail) for the frame where the backward fails and forward it to us?

@rgxie
Copy link

rgxie commented Aug 29, 2023

I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503

Hi,

please note that this is not a fix, it will completely break the math behind the approach. If you continue to have issues with running it, please consider using the Colab linked on the main page.

Thank you for your reply. I know that is not a fix. I am trying to locate the bug, this error occurs at different iterations when I use different data.

@HatsuneMiku888
Copy link
Author

@HatsuneMiku888 how good is your Python? Could you force it to create the snapshow_fw.dump of the forward pass (even tho it doesn't fail) for the frame where the backward fails and forward it to us?

Sure, I will attempt to reproduce this error on the machine where it occurred.

Btw, now I have a new problem. I faced the same Illegal memory access error during the forward training process on other dataset. But the error miraculously disappeared when I executed _C.rasterize_gaussians using snapshot_fw.dump as parameters in a separate script.

@wuchen133
Copy link

@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:

image For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?

Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab

Hello,I have the same error. And I want to know how to debug the cuda code in gaussian-splatting.I just know how to debug the python file.

@smart4654154
Copy link

@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:
image For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?
Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab

Hello,I have the same error. And I want to know how to debug the cuda code in gaussian-splatting.I just know how to debug the python file.

do you know the result,thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants