-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVPTX support for new asm! #72439
NVPTX support for new asm! #72439
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
You are missing an nvptx-modifiers.rs test which tests that register names are rendered properly in the output asm. |
As discussed on zulip, NVPTX does not support any modifiers, therefore no tests are needed. |
The code looks good! Can you update the specification in src/doc/unstable-book/src/library-features/asm.md to include the new register classes? |
They are, implicitly:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-ld Since clang gives the registers bit sizes, they are all zero extended in this implementation. Here's a small clang experiment as proof: #include <cstdint>
#include <cstdio>
__global__ void foo(uint32_t x, uint64_t * y) {
asm volatile (
"mov.u64 %0, %1;"
: "=l"(*y) : "l"(x)
);
}
__host__ int main() {
uint64_t y = UINT64_MAX;
uint64_t * yd;
cudaMalloc(&yd, sizeof(uint64_t));
cudaMemcpy(yd, &y, sizeof(uint64_t), cudaMemcpyHostToDevice);
foo<<<1, 1>>>(0xBEEF, yd);
cudaMemcpy(&y, yd, sizeof(uint64_t), cudaMemcpyDeviceToHost);
cudaFree(yd);
printf("%lx\n", y);
} If you wish, I can add the links I referenced to the documentation |
Again, look at the assembly output from godbolt: https://rust.godbolt.org/z/-8yrgo
The input value is not zero-extended anywhere in this code. |
Ah, I see now. Looking at this though, the problem seems to be more complicated though: https://rust.godbolt.org/z/ojcbJV |
This is the same as every other architecture: when you put a value into a register that is smaller than the register size, the upper bits are UNDEFINED. Since they are undefined, their exact value depends on what the optimizer decides to do. |
@bors r+ |
📌 Commit 8706f76020e84863e8afc4e25dc014decc0f5a2f has been approved by |
☔ The latest upstream changes (presumably #72516) made this pull request unmergeable. Please resolve the merge conflicts. |
Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>
@bors r+ |
📌 Commit e18054d has been approved by |
NVPTX support for new asm! This PR implements the new `asm!` syntax for the `nvptx64-nvidia-cuda` target. r? @Amanieu
Rollup of 9 pull requests Successful merges: - rust-lang#67460 (Tweak impl signature mismatch errors involving `RegionKind::ReVar` lifetimes) - rust-lang#71095 (impl From<[T; N]> for Box<[T]>) - rust-lang#71500 (Make pointer offset methods/intrinsics const) - rust-lang#71804 (linker: Support `-static-pie` and `-static -shared`) - rust-lang#71862 (Implement RFC 2585: unsafe blocks in unsafe fn) - rust-lang#72103 (borrowck `DefId` -> `LocalDefId`) - rust-lang#72407 (Various minor improvements to Ipv6Addr::Display) - rust-lang#72413 (impl Step for char (make Range*<char> iterable)) - rust-lang#72439 (NVPTX support for new asm!) Failed merges: r? @ghost
This PR implements the new
asm!
syntax for thenvptx64-nvidia-cuda
target.r? @Amanieu