-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement repr(packed) for repr(simd) #117116
Conversation
r? @oli-obk (rustbot has picked a reviewer for you, use r? to override) |
Is it possible it would make sense for this just to be what Since all the other uses of it are power-of-two anyway, and thus if you make |
let align = dl.vector_align(size); | ||
|
||
let (abi, align) = if def.repr().packed() && !e_len.is_power_of_two() { | ||
// Non-power-of-two vectors have padding up to the next power-of-two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc LLVM doesn't guarantee vectors don't have more padding than just to the next power of two, e.g. padding <2 x i8>
to 128-bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://llvm.org/docs/LangRef.html#vector-type seems to say that LLVM guarantees that vectors have no padding in their in memory representation? Padding <2 x i8>
to 128 bits is something that LLVM does inside the vector registers (e.g. on aarch64), but never in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, this is the offset in bytes between successive elements in an array with that item type including alignment padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I meant rustc assumes that npot vectors round up to the next power of two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.
I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.
I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?
they do have an intrinsic ABI alignment, it's used for default alignment of load/store/alloca when not explicitly specified and for stack spilling and a few other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)
just found this (yeah, I know that's not what you meant, but I couldn't resist):
https://lang-team.rust-lang.org/frequently-requested-changes.html#size--stride
Rust assumes that the size of an object is equivalent to the stride of an object
Maybe eventually, but for now I didn't implement any way to call simd intrinsics on packed repr vectors--they still need to be loaded into full (padded) vectors. std::simd will need to convert the vectors before calling intrinsics. |
Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well. |
Given that we're specifically working on defining target feature minima specifically so we can do that, I don't think that blocking passing in registers is acceptable. |
Unfortunately I think we simply can't have it both ways. LLVM defines vectors as having padding and we would like them to not have padding. This change doesn't affect all vectors, just those with repr(packed), which specifically removes padding and reduces alignment. A user could always use a non repr(packed) vector to pass by register. |
I don't expect this to block passing in registers, because this is defining the in-memory ABI. the passing-by-value ABI can be totally different, since semantically you have to then store that value somewhere in memory before you can look at the bytes. |
I think we can have it both ways -- just pass-by-value as a LLVM vector and use the array type for in-memory address calculations and allocating memory. load/store with vector types only read-write the non-padding bytes, so we're fine. |
True, this is definitely possible once we are able to. |
I don't think that will optimize well at all. LLVM historically reasons about in-memory and in-register data in conflationary ways. |
I don't think there's any optimization involved. It's just a single vector load. Either way, this PR doesn't change how existing vectors work. I agree that the calling convention could become an issue in the future, but I'd like to deal with that in the future when it comes up, since there's still no consensus on a fix regardless of this PR. Also, fixing |
"Legalizing to a vector load instead of a series of scalar loads" is an optimization. |
...Okay, I didn't quite divine from the messages/code changes that |
I would prefer this to come with codegen/assembly tests to demonstrate what it actually looks like when these types are interacted with, and to prove it like... legalizes correctly, where "correctly" in this case is probably "not an LLVM error, I guess?" |
Since |
I want to approve this but I still can't see what the emitted LLVMIR is, and I can't e.g. increase my confidence by reaching for friends who know way more about LLVMIR and legalization if there's no codegen diffs to show them. |
Not quite as rigorous as as a codegen test, consider this reduced case for illustrative purposes: #![feature(repr_simd, platform_intrinsics)]
#[repr(simd, packed)]
pub struct Simd<T, const N: usize>([T; N]);
#[repr(simd)]
#[derive(Copy, Clone)]
pub struct FullSimd<T, const N: usize>([T; N]);
extern "platform-intrinsic" {
fn simd_mul<T>(a: T, b: T) -> T;
}
// non-powers-of-two have padding and need to be expanded to full vectors
fn load<T, const N: usize>(v: Simd<T, N>) -> FullSimd<T, N> {
unsafe {
let mut tmp = core::mem::MaybeUninit::<FullSimd<T, N>>::uninit();
std::ptr::copy_nonoverlapping(&v as *const _, tmp.as_mut_ptr().cast(), 1);
tmp.assume_init()
}
}
pub fn square(x: Simd<f32, 3>) -> FullSimd<f32, 3> {
let x = load(x);
unsafe { simd_mul(x, x) }
} With optimization, this simply generates the following:
This is nearly the same as using regular vectors, but note that the input |
Whoops, accidentally closed while writing my comment :) |
@workingjubilee I added my comment above as a codegen test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1a4f00e
to
6ba6447
Compare
This comment has been minimized.
This comment has been minimized.
6ba6447
to
803623e
Compare
💔 Test failed - checks-actions |
@bors r- |
@workingjubilee I removed the codegen test since it's so dependent on optimization (just little things like attributes are changing, but making it impossible to write), and the tests run on so many optimization levels, I can't seem to make a useful test. Considering |
Indeed, I mostly wanted an example! sadness about the test, though. It really shouldn't be so hard... @bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8b1ba11): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 669.249s -> 669.313s (0.01%) |
Thanks :) |
…nsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
…rinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
Rollup merge of rust-lang#125311 - calebzulawski:repr-packed-simd-intrinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
…, r=calebzulawski Test codegen for `repr(packed,simd)` -> `repr(simd)` This adds the codegen test originally requested in rust-lang#117116 but exploiting the collection of features in FileCheck and compiletest to make it more resilient to expectations being broken by optimization levels. Mostly by presetting optimization levels for each revision of the tests. I do not think the dereferenceable attribute's presence or absence is that important. r? `@calebzulawski`
…, r=calebzulawski Test codegen for `repr(packed,simd)` -> `repr(simd)` This adds the codegen test originally requested in rust-lang#117116 but exploiting the collection of features in FileCheck and compiletest to make it more resilient to expectations being broken by optimization levels. Mostly by presetting optimization levels for each revision of the tests. I do not think the dereferenceable attribute's presence or absence is that important. r? `@calebzulawski`
#![allow(non_camel_case_types)] | ||
|
||
#[repr(simd, packed)] | ||
struct Simd<T, const N: usize>([T; N]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From how packed
usually works, I would expect this to mean that the type has alignment 1. But that doesn't seem to be the case, instead the alignment is the largest possible for the size, or something like that?
What happens with packed(N)
?
Would be good to have the interaction of simd
and packed
documented somewhere.
FWIW codegen has support for using different LLVM types in by-val vs by-ref situations: specifically, |
if ty.is_simd() && !matches!(arg.val, OperandValue::Immediate(_)) { | ||
return_error!(InvalidMonomorphization::SimdArgument { span, name, ty: *ty }); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems no longer accurate... maybe that got changed by #125311? Doing simd_mul etc on a packed SIMD type works just fine now. Even in debug builds this generates the code one would hope for.
This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319