-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMake: default to -arch=native for CUDA build #10320
CMake: default to -arch=native for CUDA build #10320
Conversation
It might be good to make |
I personally think either way is fine. The target group for these changes/options are I would argue developers who are going to frequently recompile the code. Currently the logic is that CUDA architectures are only set automatically if the user does not set |
Actually no, if |
It should be possible to check the CUDA toolkit version in the |
6131aea
to
62751a8
Compare
I misremembered both the CUDA version with which |
# 60 == P100, FP16 CUDA intrinsics | ||
# 61 == Pascal, __dp4a instruction (per-byte integer dot product) | ||
# 70 == V100, FP16 tensor cores | ||
# 75 == Turing, int6 tensor cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think int6
-> int8
?
Quick question: on my RTX2060 which has compute capability 7.5, the best configuration to build with (in terms of full feature support and least amount of compile time) is: cmake -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="75" .. Is that correct?
|
The "high-level" C-like CUDA code is first compiled to PTX which is the CUDA equivalent of assembly. The PTX code is then converted to PTXAS which is the binary format that the GPU can actually run. I think when you set For llama.cpp/GGML the code should always work correctly if you compile for exactly the compute capability that you are going to use. The listed compute capabilities are the breakpoints where different features are used and the PTX code ends up being different. So all compute capabilities >= 7.5 should generate the same PTX code and only maybe different PTXAS code. But so far I have never observed any performance difference from compiling with a compute capability that is higher than the minimum for PTX. |
If |
This PR extends the CUDA build documentation by explaining how to speed up local builds.
Also I changed "documentations" to the singular in the README since I think it sounds more natural.