-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid using make_direct_deprecated() in extern "ptx-kernel" #133932
base: master
Are you sure you want to change the base?
Conversation
r? @wesleywiser rustbot has assigned @wesleywiser. Use |
@@ -166,7 +166,7 @@ pub unsafe extern "ptx-kernel" fn f_f32_arg(_a: f32) {} | |||
pub unsafe extern "ptx-kernel" fn f_f64_arg(_a: f64) {} | |||
|
|||
// CHECK: .visible .entry f_single_u8_arg( | |||
// CHECK: .param .align 1 .b8 f_single_u8_arg_param_0[1] | |||
// CHECK: .param .u8 f_single_u8_arg_param_0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this has an identical ABI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be correct. My worry here is that it goes against the "ptx-interoperability" docs.
https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#aggregates-and-unions
Getting these kind of struct correct (per interop docs) was a big reason I couldn't get it "just right" when attempting to remove the PassMode::Direct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit reverts the ABI for this function back to what it was.
@@ -242,22 +242,6 @@ pub unsafe extern "ptx-kernel" fn f_float_array_arg(_a: [f32; 5]) {} | |||
//pub unsafe extern "ptx-kernel" fn f_u128_array_arg(_a: [u128; 5]) {} | |||
|
|||
// CHECK: .visible .entry f_u32_slice_arg( | |||
// CHECK: .param .u64 f_u32_slice_arg_param_0 | |||
// CHECK: .param .u64 f_u32_slice_arg_param_1 | |||
// CHECK: .param .align 8 .b8 f_u32_slice_arg_param_0[16] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't guarantee any ABI for wide pointers anyway and if this is breaks the ABI all users can be updated to pass #[repr(C)] struct Slice<'a, T> { ptr: *const (), len: usize, _marker: PhantomData<&'a [T]> }
on the host side anyway (and preferably also the gpu side to avoid depending on the unstable ABI).
// CHECK: .visible .entry f_tuple_u8_u8_u32_arg( | ||
// CHECK: .param .align 4 .b8 f_tuple_u8_u8_u32_arg_param_0[8] | ||
#[no_mangle] | ||
pub unsafe extern "ptx-kernel" fn f_tuple_u8_u8_u32_arg(_a: (u8, u8, u32)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuples don't have a stable ABI for any calling convention. And can always be replaced with structs anyway.
cc @kjetilkjeka @RDambrosio016 as target maintainers, but given that this is a tier 3 target without documentation about what the |
f38e850
to
9aca298
Compare
I will try to run this patch through my workplace internal rust CUDA projects and see if anything interesting turns up later today. |
I'm a colleague of Kjetil. This change does not seem to break our codebase. |
This method will be removed in the future as it produces a broken ABI that depends on cg_llvm implementation details. After this PR wasm32-unknown-unknown is the only remaining user of make_direct_deprecated().
0a44f67
to
7f69c46
Compare
FYI |
This method will be removed in the future as it produces a broken ABI that depends on cg_llvm implementation details. After this PR wasm32-unknown-unknown is the only remaining user of make_direct_deprecated().
Fixes #117271
Blocks #38788