-
Notifications
You must be signed in to change notification settings - Fork 214
Update GPU Compute Capacity support to match tensorflow #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As per #149 (comment), this is not possible due to build times. I'm leaving the issue up as a reminder to do it when possible. |
One part of the solution should be to provide a much better error message, which accurately describes what's happening and how to fix the issue. |
Also, would it be possible to change build.sh to allow the passthrough of the |
I don't think we can do much about the error message, since it comes from the native code. But not overwriting the variable sounds easy enough. You can set it in a bazelrc file, too, which takes precedence over this. |
When you get more build resources, please add compute capability 3.7 for K80 on the default support list, as it's very widely used. |
@rnett , I've replaced the CUDA capabilities by this line that I've picked from the .bazelrc, is that right? It is still building but looks successful so far, if you agree then I'll merge this change to I think at some point we'll need to make better usage of the configuration in this .bazelrc file but du right now, since I'm just about to release 0.3.0, I prefer doing minimal changes. I might do a 0.3.1 release right after just to align our build options with the official ones. |
Seems fine to me. We may have to change it eventually, once the next gen NVIDIA GPUs launch, but we should be good for a while. Would be nice if we could get SIG BUILD or whoever's responsible for it to publicize their setup/arguments. |
I guess they use the configs in the main repo, check at the |
Huh TIL, I would've expected that to come up in the issues I was looking at. |
List of capabilities as been updated to Closing this issue for now, please reopen if some capabilities are still not supported. |
When trying to test stuff on GPU (on Linux) on 0.3.0-SNAPSHOT, it takes a while to initialize, before giving me:
This is with a 1070 (compute 6.1) that was successfully recognized earlier:
After some diging, I found tensorflow/tensorflow#41990, tensorflow/tensorflow#41132 (comment), and tensorflow/tensorflow#41892 (comment).
The last two in particular imply that the issue is that our binaries aren't being built with support for compute capacity 6.1, and sure enough, we don't: https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/build.sh#L25-L32
As per the 2nd and 3rd links, and https://www.tensorflow.org/install/gpu#hardware_requirements, the other tensorflow binaries (Python, C, C++, etc) support
3.5, 5.0, 6.0, 7.0, 7.5, 8.0 and higher than 8.0
. Imo, we should do the same, ideally in a way we don't have to update when it changes (will simply not exporting it work? The defaults are specified in https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L600). This will likely increase build times though, which I think we already have issues with.The text was updated successfully, but these errors were encountered: