Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using make_direct_deprecated() in extern "ptx-kernel" #133932

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Dec 5, 2024

This method will be removed in the future as it produces a broken ABI that depends on cg_llvm implementation details. After this PR wasm32-unknown-unknown is the only remaining user of make_direct_deprecated().

Fixes #117271
Blocks #38788

@bjorn3 bjorn3 added O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html A-ABI Area: Concerning the application binary interface (ABI) labels Dec 5, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2024

r? @wesleywiser

rustbot has assigned @wesleywiser.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 5, 2024
@@ -166,7 +166,7 @@ pub unsafe extern "ptx-kernel" fn f_f32_arg(_a: f32) {}
pub unsafe extern "ptx-kernel" fn f_f64_arg(_a: f64) {}

// CHECK: .visible .entry f_single_u8_arg(
// CHECK: .param .align 1 .b8 f_single_u8_arg_param_0[1]
// CHECK: .param .u8 f_single_u8_arg_param_0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this has an identical ABI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be correct. My worry here is that it goes against the "ptx-interoperability" docs.

https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#aggregates-and-unions

Getting these kind of struct correct (per interop docs) was a big reason I couldn't get it "just right" when attempting to remove the PassMode::Direct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit reverts the ABI for this function back to what it was.

@@ -242,22 +242,6 @@ pub unsafe extern "ptx-kernel" fn f_float_array_arg(_a: [f32; 5]) {}
//pub unsafe extern "ptx-kernel" fn f_u128_array_arg(_a: [u128; 5]) {}

// CHECK: .visible .entry f_u32_slice_arg(
// CHECK: .param .u64 f_u32_slice_arg_param_0
// CHECK: .param .u64 f_u32_slice_arg_param_1
// CHECK: .param .align 8 .b8 f_u32_slice_arg_param_0[16]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't guarantee any ABI for wide pointers anyway and if this is breaks the ABI all users can be updated to pass #[repr(C)] struct Slice<'a, T> { ptr: *const (), len: usize, _marker: PhantomData<&'a [T]> } on the host side anyway (and preferably also the gpu side to avoid depending on the unstable ABI).

// CHECK: .visible .entry f_tuple_u8_u8_u32_arg(
// CHECK: .param .align 4 .b8 f_tuple_u8_u8_u32_arg_param_0[8]
#[no_mangle]
pub unsafe extern "ptx-kernel" fn f_tuple_u8_u8_u32_arg(_a: (u8, u8, u32)) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples don't have a stable ABI for any calling convention. And can always be replaced with structs anyway.

@bjorn3
Copy link
Member Author

bjorn3 commented Dec 5, 2024

cc @kjetilkjeka @RDambrosio016 as target maintainers, but given that this is a tier 3 target without documentation about what the extern "ptx-kernel" ABI is actually supposed to be, I don't think it is worth blocking on their approval. Having make_direct_deprecated() at all blocks removal broken code in the ABI handling of codegen backends.

@bjorn3 bjorn3 force-pushed the fix_ptx_kernel_abi branch 2 times, most recently from f38e850 to 9aca298 Compare December 6, 2024 09:11
@kjetilkjeka
Copy link
Contributor

I will try to run this patch through my workplace internal rust CUDA projects and see if anything interesting turns up later today.

@vetleras
Copy link

vetleras commented Dec 6, 2024

I'm a colleague of Kjetil. This change does not seem to break our codebase.

This method will be removed in the future as it produces a broken ABI
that depends on cg_llvm implementation details. After this PR
wasm32-unknown-unknown is the only remaining user of
make_direct_deprecated().
@bjorn3 bjorn3 force-pushed the fix_ptx_kernel_abi branch from 0a44f67 to 7f69c46 Compare December 6, 2024 13:40
@LegNeato
Copy link
Contributor

LegNeato commented Jan 4, 2025

This method will be removed in the future as it produces a broken ABI that depends on cg_llvm implementation details. After this PR wasm32-unknown-unknown is the only remaining user of make_direct_deprecated().

FYI rust-gpu uses make_direct_deprecated(), as previously mentioned: #115666 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvptx "ptx-kernel" ABI (feature: abi_ptx) uses PassMode::Direct for Aggregates
6 participants