-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaCheckError() failed : invalid device function #19
Comments
Just adding to this since it was useful to me. I hit this same problem when testing on AWS EC2 instances with GPU. I had to use sm_20 in two places as mentioned above: and force the rebuild of the python modules: |
When I ran I tried to follow having the
|
I got the same error (i.e., cudaCheckError() failed : invalid device function) with my Tesla K40. When I changed the |
@ahmedammar you could even use sm_30 for AWS |
@eakbas Thanks! |
For other GPUs:
Credit to https://github.com/mldbai/mldb/blob/master/ext/tensorflow.mk |
@VitaliKaiser Hello, I am using AWS EC2 GPU to run demo.py, getting 'cudaCheckError() failed : invalid device function'. |
@fangyan93 It´s quite a while since I last looked into it, but I had lost a lot of time to figure out things were not rebuild! |
@VitaliKaiser Thanks for reply. Yes, I remove the previous build files and rebuild from very beginning, it works! |
So for those who are still lost. Here are a few clean steps to resolve the issue (you need to recompile your CUDA):
|
I spent so much time debugging this issue that I give the answer here:
When running the demo.py as stated in README, I was getting an error
cudaCheckError() failed : invalid device function
with no traceback. It happen when this line was executed : https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/fast_rcnn/test.py#L169I have never seen this error in any of my other tensorflow project.
This issue was similar to this one in Faster-RCNN for python : rbgirshick/py-faster-rcnn#2
And i solved it by updating the arch code in https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/make.sh#L9 and https://github.com/smallcorgi/Faster-RCNN_TF/blob/master/lib/setup.py#L137
I don't know how to find the arch code of any GPU, but for Tesla K80, sm_37 seems to work.
I don't know if we can change something so that it works for any GPU or maybe we can add an information in the README?
Hope it can help people having the same issue.
The text was updated successfully, but these errors were encountered: