Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-power-of-2 Simd types have wrong size #319

Closed
programmerjake opened this issue Nov 30, 2022 · 42 comments · Fixed by #422
Closed

non-power-of-2 Simd types have wrong size #319

programmerjake opened this issue Nov 30, 2022 · 42 comments · Fixed by #422
Labels
C-bug Category: Bug

Comments

@programmerjake
Copy link
Member

I tried this code:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f4ee5c373508bcb37be6cda3bb627b87

#![feature(repr_simd)]
use core::alloc::Layout;
#[repr(simd)]
struct Simd<T, const N: usize>(pub [T; N]);

fn main() {
    assert_eq!(
        Layout::new::<Simd<i8, 3>>(),
        Layout::from_size_align(3, 1).unwrap()
    );
}

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

Instead, this happened:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Layout { size: 4, align: 4 (1 << 2) }`,
 right: `Layout { size: 3, align: 1 (1 << 0) }`', src/main.rs:7:5

Meta

rustc --version --verbose:

<current nightly as of issue creation -- probably 2022-11-29 nightly>

crate version in Cargo.toml:
N/A

@programmerjake programmerjake added the C-bug Category: Bug label Nov 30, 2022
@programmerjake
Copy link
Member Author

see also #63

@programmerjake
Copy link
Member Author

@programmerjake
Copy link
Member Author

programmerjake commented Nov 30, 2022

imho we should define layout such that sizeof(Simd<T, N>) == sizeof([T; N]) and alignof(Simd<T, N>) == gcd(alignof(T), llvm_alignof(Simd<T, N>)) >= alignof(T) (edit: wrong. see: #319 (comment))
or, if we don't want to depend on llvm, we could pick the largest power of two <= sizeof([T; N]) such that no padding is required.

@programmerjake
Copy link
Member Author

Note the changes to vector layout by @RalfJung in rust-lang/rust@891a4da are the exact opposite of what we probably want: that commit adds padding until size is divisible by alignment, what we probably want is to decrease alignment enough that we can enforce vectors are the exact same size as the corresponding array (usually means no padding, there could still be padding if the element type contains padding, portable-simd will presumably enforce that those element types aren't allowed but repr(simd) might allow them in the future)

@programmerjake
Copy link
Member Author

imho we should define layout such that sizeof(Simd<T, N>) == sizeof([T; N]) and alignof(Simd<T, N>) == gcd(alignof(T), llvm_alignof(Simd<T, N>)) >= alignof(T)

now that I think about some more, gcd of aligns is wrong, instead we probably want alignof(Simd<T, N>) == gcd(sizeof([T; N]), llvm_alignof(Simd<T, N>)) >= alignof(T)

@calebzulawski
Copy link
Member

I wonder, what's the motivation for that commit? If it's absolutely necessary for some other reason, we could pretty easily add a new repr for portable SIMD.

@programmerjake
Copy link
Member Author

I wonder, what's the motivation for that commit? If it's absolutely necessary for some other reason, we could pretty easily add a new repr for portable SIMD.

I would assume it's for enforcing rust's rule that every type must have its size be divisible by its alignment. llvm doesn't enforce that rule, so relying on llvm to calculate layout for vectors can violate that rule.

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2022 via email

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2022

The actual layout computation is here. And yes size.align_to(align.abi) is needed because size must be a multiple of align in Rust. So if you want 3-element SIMD vectors to not have trailing padding, then vector_align needs to be adjusted to return a lower alignment for these types.

This behavior of rustc (to round up vector sizes to the next power of two) has existed at least since 2016.

Cc @eddyb

@eddyb
Copy link
Member

eddyb commented Dec 2, 2022

To be clear, this is not rustc-specific behavior but LLVM behavior - the reason we have to reuse the datalayout string is that mismatches between rustc and LLVM could lead to miscompilations if we have to use APIs where we assume LLVM is working with the same algorithms and configuration we are.

IIRC the logic is roughly:

  • datalayout string can contain an alignment for some exact vector size
  • otherwise, the default is "natural alignment", i.e. the alignment is the same as the size (except rounded up to a power of two, if not already? I forget if it's meant to be stricter or not)

@eddyb
Copy link
Member

eddyb commented Dec 2, 2022

Reading more of the discussion above, you really cannot talk about llvm_alignof(Simd<T, N>) as if it's different than the Rust alignment.

At the point where you start diverging in alignments from LLVM, you can no longer use primitive vector types and have to wrap them in packed structs to decrease alignment... except the vector is already rounded up to a "plausible hardware size" by that point so the packed struct can no longer decrease the size!

You would probably need to "just" not use non-pow2 LLVM vector types, i.e. do not have Abi::Vector for such malformed #[repr(simd)] types. Or maybe they can still be used when by-value but great care must be taken around loading and storing - you effectively would need to generate e.g. 3 loads/stores to read/write to/from <3 x T> values (or weird FCA loads/stores of [3 x T] which LLVM would desugar).

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

@programmerjake
Copy link
Member Author

what i'm envisioning is using npot vectors by value (not in memory) just uses usual llvm semantics, whereas when loading/storing llvm allows specifying the alignment, which is where we would override it to use the rust abi alignment.
if llvm decides to store vectors in temporary memory variables, it's free to use whatever layout it pleases, since llvm is allowed to modify code using the as-if rule.

note that npot vectors are natively supported on some ISAs (e.g. SVE, RVV, and SimpleV)

@programmerjake
Copy link
Member Author

for llvm npot vector loads/stores, afaict from reading the llvm language reference llvm guarantees that they will not load/store padding if the elements are an integer byte size (so not i4 since that's 1/2 byte) and have no padding -- since it says here vector load/stores act the same as array load/stores when the element type is an integer byte size. so, no, we won't need multiple llvm load/store instructions to load/store npot vectors.

@programmerjake
Copy link
Member Author

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

we shouldn't run into that particular problem since the proposed layouts will always have size % align == 0 like any other rust type, they're basically arrays' layouts but with the align increased as close as possible to llvm's preferred align without introducing padding or violating size % align == 0

@workingjubilee
Copy link
Member

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

We are strongly inclined to think we do not want byte-level padding in SIMD types, I think only a few murmurs have even discussed that possibility and largely in a negative light. We collectively average out to feeling a little dubious even about niches (or other forms of bit-level padding).

@programmerjake
Copy link
Member Author

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

power-of-2 sized vectors always matched their corresponding array size and my proposal doesn't change that. if we use the gcd(size, llvm_align) proposal then power-of-2 vectors' alignments should be entirely unchanged too (assuming there isn't a weird arch that decides i8x4 has align 16 or something). my proposal only changes non-power-of-2 vectors.

I should point out that I am explicitly not changing std::arch simd types' layout, those are defined by their corresponding arch (that said, since iirc all current std::arch vector types are power-of-2 types without padding, the gcd(size, llvm_align) proposal would leave them unaffected even if those std::arch types were defined using std::simd with the gcd(size, llvm_align) proposal.)

We are strongly inclined to think we do not want byte-level padding in SIMD types,

so then you also agree we should change rustc's current vector layouts to never introduce padding bytes. this requires potentially reducing alignment for non-power-of-2 vectors in order to not violate the rust size % align == 0 rule.

@programmerjake
Copy link
Member Author

note that my proposal means that slice::as_simd can never panic, currently it has to verify that the provided vector type has no padding and panics if that fails, which is a major footgun.
https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_simd

@workingjubilee
Copy link
Member

...That's not really a footgun, honestly, unless you write code that never actually tests the function ever, since the result must be monomorphized.

@calebzulawski
Copy link
Member

In our docs we already repeatedly describe vectors as "basically arrays" so I do think it's a footgun if they have unusual layouts, in this function and elsewhere.

@programmerjake
Copy link
Member Author

...That's not really a footgun, honestly, unless you write code that never actually tests the function ever, since the result must be monomorphized.

maybe not in the sense of unintentionally causing UB, but it is in the sense of writing a bunch of code generic over vector lengths and expecting it to work for any length...which could require re-architecting a bunch of code to fix.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2022

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

You were quoting a statement about the size but your reply talks about alignment. Looks to me like you are talking past each other?

f32x4 on x86 can be 16-byte aligned and still have the same size as [f32; 4] without any problem.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2022

power-of-2 sized vectors always matched their corresponding array size and my proposal doesn't change that. if we use the gcd(size, llvm_align) proposal then power-of-2 vectors' alignments should be entirely unchanged too (assuming there isn't a weird arch that decides i8x4 has align 16 or something). my proposal only changes non-power-of-2 vectors.

Usually alignment of such types is not even something we can decide. What is their alignment in C? We pretty much have to use the same to be ABI compatible.

@bjorn3
Copy link
Member

bjorn3 commented Dec 4, 2022

Only the types in core::arch need to be abi compatible right? There are From impls to convert between the core::arch types and core::simd::Simd, but nothing that converts references for one type to references for the other type.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2022

That is assuming people won't declare their own SIMD types and expect them to be compatible with the equivalent def.n in C.

@programmerjake
Copy link
Member Author

Usually alignment of such types is not even something we can decide. What is their alignment in C? We pretty much have to use the same to be ABI compatible.

imho Simd<T, N> is not the same type as existing C simd types, it's something new, defined by Rust, so ABI compatibility isn't an issue, unless we want it to be, in which case we can provide same-layout guarantees for some types e.g. we might want to define Simd<u32, 4> on x86_64 to have the same layout as __m128, and to define Simd<i32, 5> to not match any corresponding std::arch::x86_64 type.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2022

Fair. But make sure this is documented well and brace yourselves for bug reports caused by any difference in ABI. ;)

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

@programmerjake
Copy link
Member Author

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

for x86_64, yes, but only in clang, not gcc (idk about icc or msvc):

typedef unsigned short u16x3 __attribute__((vector_size(3 * sizeof(unsigned short))));

@workingjubilee
Copy link
Member

Fair. But make sure this is documented well and brace yourselves for bug reports caused by any difference in ABI. ;)

our 128-bit integers don't even match up with C!

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

"Ad hoc." As @eddyb mentioned/implied, f32x3 is weird and special in most C implementations of it, if there are any (and there are some, due to code attempting to cross the C<->GLSL boundary), such that it traumatizes a lot of C programmers enough they don't want to touch it. People won't be that surprised that yet another implementation has decided on yet another quirky variant, though they may be annoyed.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2022

our 128-bit integers don't even match up with C!

And that is such a success that you want to repeat it? 😂

@RalfJung
Copy link
Member

RalfJung commented Dec 5, 2022 via email

@programmerjake
Copy link
Member Author

I'm out of my depth here, I don't understand the tradeoffs involved well enough to judge the pros and cons. I guess instead of having the same logic in many places we could have a shared helper on Abi that distinguishes "register type" and "memory type", but no idea how practical that is.

imho it would be useful to also have a load/store type (value actually being passed to the load/store instruction):
comparison:

  • bool
    • memory type: i8
    • register type: i1
    • store type: i8
    %bool_ptr = getelementptr i8, ptr %bool_array_ptr, i64 %idx ; compute pointer using memory type
    %v = zext i1 %bool_value to i8 ; convert register type to store type
    store i8 %v, ptr %bool_ptr ; store using store type
  • Simd<i32, 3>
    • memory type: [3 x i32]
    • register type: <3 x i32>
    • store type: <3 x i32> with load/store ops' align being explicitly specified
    %vec_ptr = getelementptr [3 x i32], ptr %vec_array_ptr, i64 %idx ; compute pointer using memory type
    ; register type is store type, no conversion needed
    store <3 x i32> %vec_value, ptr %vec_ptr, align 4 ; store using store type with alignment overridden to rust type alignment

@bjorn3
Copy link
Member

bjorn3 commented Dec 5, 2022

The existing handling for handling bools being i1 in registers touches a lot of code (including seemingly unrelated code like determining signs for tags in the discriminant calculation code) and requires separate "immediate" and non-immediate versions of several methods. I am pretty sure the wrong one is used at several places. Also for Cranelift it did require extra special cases to handle "i1" being 8 bits big as Cranelift only has i8 and up. I believe GCC also requires special code as it uses a true bool type rather than a 1bit integer. Honestly I did prefer to just convert i1 to i8 as soon as possible and never pass it around as immediate. I realize this may not be possible for performance reasons though.

@programmerjake
Copy link
Member Author

@programmerjake
Copy link
Member Author

programmerjake commented Feb 24, 2023

note that the alignment specification I'm proposing will allow arbitrary vector type punning without having alignment issues as long as the type pun is otherwise valid and llvm_align doesn't do crazy stuff (e.g. if llvm decides alignof(i8x4) is less than alignof(i32x1) for some strange reason then that prevents type punning i32x1 to i8x4): see riscv-non-isa/riscv-elf-psabi-doc#347 (comment) for full explanation

@calebzulawski
Copy link
Member

calebzulawski commented Sep 24, 2023

Considering this practically rather than idealistically for a moment, it's questionable to me whether it's really worth fixing the layout vs considering other options. bool is perhaps the most frequently used primitive, whereas std::simd is niche, and non-PoT vectors considerably more niche. Embedding a split memory/register repr throughout the compiler seems impractical and even if we decided that was the best option, I genuinely doubt anyone would ever implement it.

That said, not supporting non-PoT vectors could excessively complicate our API and perhaps delay stabilization while we try to sort out a simpler alternative to SupportedLaneCount (see #364). I think it might be worth exploring other options that would give us non-PoT vectors in a more practical way. I've thought of 3 alternatives:

  • Leave things as they are. The current implementation of non-PoT vectors works. We could leave them be and use either an assert or non-local error in functions like [T]::as_simd. This would leave us with the option of reducing the alignment in the future (maybe across an edition change).
  • Add a new CompactSimd or UnalignedSimd type. This type could implement all of the same functions and traits as Simd, but have a lesser alignment, either repr(Rust) or repr(compact_simd). To avoid a split memory/register repr, I would not make the type a vector type, but instead convert it to a repr(simd) type before passing it to the existing simd_* intrinsics. We could easily implement Into<Simd> for CompactSimd and perhaps Deref<Target = UnalignedSimd> for Simd. [T]::as_simd could still error, but have a matching [T]::as_compact_simd that works for any number of elements.
  • Make Simd repr(compact_simd) and convert before using. Like the previous example, but wrapped into one type. We could add new intrinsics simd_align and simd_unalign that convert between repr(simd) and repr(compact_simd), a no-op when alignments match. In this case, repr(compact_simd) would be a glorified repr(align(N)).

Personally, I like the first or second alternative best. LLVM seems to be able to generate pretty good code for non-PoT vectors, and I think messing with the alignment could cause problems. The second option would allow the user to choose which is more important--having a flexible memory layout or reliable codegen that doesn't juggle layouts.

cc @programmerjake @scottmcm

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 4, 2023
…rkingjubilee

Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Dec 8, 2023
…workingjubilee

Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 9, 2023
…rkingjubilee

Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 11, 2023
…rkingjubilee

Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Dec 12, 2023
Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Implement repr(packed) for repr(simd)

This allows creating vectors with non-power-of-2 lengths that do not have padding.  See rust-lang/portable-simd#319
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 2, 2024
…nsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Jun 2, 2024
…rinsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jun 2, 2024
Rollup merge of rust-lang#125311 - calebzulawski:repr-packed-simd-intrinsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
bors pushed a commit to rust-lang/rust-analyzer that referenced this issue Jun 20, 2024
…r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In #117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants