non-power-of-2 Simd types have wrong size #319

programmerjake · 2022-11-30T11:18:41Z

I tried this code:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f4ee5c373508bcb37be6cda3bb627b87

#![feature(repr_simd)]
use core::alloc::Layout;
#[repr(simd)]
struct Simd<T, const N: usize>(pub [T; N]);

fn main() {
    assert_eq!(
        Layout::new::<Simd<i8, 3>>(),
        Layout::from_size_align(3, 1).unwrap()
    );
}

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

Instead, this happened:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Layout { size: 4, align: 4 (1 << 2) }`,
 right: `Layout { size: 3, align: 1 (1 << 0) }`', src/main.rs:7:5

Meta

rustc --version --verbose:

<current nightly as of issue creation -- probably 2022-11-29 nightly>

crate version in Cargo.toml:
N/A

The text was updated successfully, but these errors were encountered:

programmerjake · 2022-11-30T11:20:22Z

see also #63

programmerjake · 2022-11-30T11:28:55Z

see some previous discussion: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Bytemuckable.20Simd/near/311734698

programmerjake · 2022-11-30T11:36:18Z

imho we should define layout such that sizeof(Simd<T, N>) == sizeof([T; N]) and alignof(Simd<T, N>) == gcd(alignof(T), llvm_alignof(Simd<T, N>)) >= alignof(T) (edit: wrong. see: #319 (comment))
or, if we don't want to depend on llvm, we could pick the largest power of two <= sizeof([T; N]) such that no padding is required.

programmerjake · 2022-12-01T22:06:48Z

Note the changes to vector layout by @RalfJung in rust-lang/rust@891a4da are the exact opposite of what we probably want: that commit adds padding until size is divisible by alignment, what we probably want is to decrease alignment enough that we can enforce vectors are the exact same size as the corresponding array (usually means no padding, there could still be padding if the element type contains padding, portable-simd will presumably enforce that those element types aren't allowed but repr(simd) might allow them in the future)

programmerjake · 2022-12-01T22:13:26Z

imho we should define layout such that sizeof(Simd<T, N>) == sizeof([T; N]) and alignof(Simd<T, N>) == gcd(alignof(T), llvm_alignof(Simd<T, N>)) >= alignof(T)

now that I think about some more, gcd of aligns is wrong, instead we probably want alignof(Simd<T, N>) == gcd(sizeof([T; N]), llvm_alignof(Simd<T, N>)) >= alignof(T)

calebzulawski · 2022-12-01T22:32:38Z

I wonder, what's the motivation for that commit? If it's absolutely necessary for some other reason, we could pretty easily add a new repr for portable SIMD.

programmerjake · 2022-12-01T23:13:31Z

I wonder, what's the motivation for that commit? If it's absolutely necessary for some other reason, we could pretty easily add a new repr for portable SIMD.

I would assume it's for enforcing rust's rule that every type must have its size be divisible by its alignment. llvm doesn't enforce that rule, so relying on llvm to calculate layout for vectors can violate that rule.

RalfJung · 2022-12-02T07:29:38Z

My commit does not change layout at all, it just changes the layout consistency check, assuming that the existing layout code is right. If the existing layout code is buggy then yeah I also ported the bug to the layout checker now. Feel free to fix both at the same time. :) The important part is the the size and align of the type must be uniquely determined by the size and align of the vector elements and the number of elements.

RalfJung · 2022-12-02T07:52:34Z

The actual layout computation is here. And yes size.align_to(align.abi) is needed because size must be a multiple of align in Rust. So if you want 3-element SIMD vectors to not have trailing padding, then vector_align needs to be adjusted to return a lower alignment for these types.

This behavior of rustc (to round up vector sizes to the next power of two) has existed at least since 2016.

Cc @eddyb

eddyb · 2022-12-02T08:23:12Z

To be clear, this is not rustc-specific behavior but LLVM behavior - the reason we have to reuse the datalayout string is that mismatches between rustc and LLVM could lead to miscompilations if we have to use APIs where we assume LLVM is working with the same algorithms and configuration we are.

IIRC the logic is roughly:

datalayout string can contain an alignment for some exact vector size
otherwise, the default is "natural alignment", i.e. the alignment is the same as the size (except rounded up to a power of two, if not already? I forget if it's meant to be stricter or not)

eddyb · 2022-12-02T08:35:01Z

Reading more of the discussion above, you really cannot talk about llvm_alignof(Simd<T, N>) as if it's different than the Rust alignment.

At the point where you start diverging in alignments from LLVM, you can no longer use primitive vector types and ~~have to wrap them in packed structs to decrease alignment~~... except the vector is already rounded up to a "plausible hardware size" by that point so the packed struct can no longer decrease the size!

You would probably need to "just" not use non-pow2 LLVM vector types, i.e. do not have Abi::Vector for such malformed #[repr(simd)] types. Or maybe they can still be used when by-value but great care must be taken around loading and storing - you effectively would need to generate e.g. 3 loads/stores to read/write to/from <3 x T> values (or weird FCA loads/stores of [3 x T] which LLVM would desugar).

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

programmerjake · 2022-12-02T09:07:31Z

what i'm envisioning is using npot vectors by value (not in memory) just uses usual llvm semantics, whereas when loading/storing llvm allows specifying the alignment, which is where we would override it to use the rust abi alignment.
if llvm decides to store vectors in temporary memory variables, it's free to use whatever layout it pleases, since llvm is allowed to modify code using the as-if rule.

note that npot vectors are natively supported on some ISAs (e.g. SVE, RVV, and SimpleV)

programmerjake · 2022-12-02T09:39:47Z

for llvm npot vector loads/stores, afaict from reading the llvm language reference llvm guarantees that they will not load/store padding if the elements are an integer byte size (so not i4 since that's 1/2 byte) and have no padding -- since it says here vector load/stores act the same as array load/stores when the element type is an integer byte size. so, no, we won't need multiple llvm load/store instructions to load/store npot vectors.

programmerjake · 2022-12-02T09:50:36Z

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

we shouldn't run into that particular problem since the proposed layouts will always have size % align == 0 like any other rust type, they're basically arrays' layouts but with the align increased as close as possible to llvm's preferred align without introducing padding or violating size % align == 0

workingjubilee · 2022-12-04T02:00:06Z

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

Frankly, after what I've seen from GLSL (which has e.g. size=12 align=16 types that "eat" padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can't "take back" the magic, even if you manage to make it work initially.

We are strongly inclined to think we do not want byte-level padding in SIMD types, I think only a few murmurs have even discussed that possibility and largely in a negative light. We collectively average out to feeling a little dubious even about niches (or other forms of bit-level padding).

programmerjake · 2022-12-04T02:27:10Z

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

power-of-2 sized vectors always matched their corresponding array size and my proposal doesn't change that. if we use the gcd(size, llvm_align) proposal then power-of-2 vectors' alignments should be entirely unchanged too (assuming there isn't a weird arch that decides i8x4 has align 16 or something). my proposal only changes non-power-of-2 vectors.

I should point out that I am explicitly not changing std::arch simd types' layout, those are defined by their corresponding arch (that said, since iirc all current std::arch vector types are power-of-2 types without padding, the gcd(size, llvm_align) proposal would leave them unaffected even if those std::arch types were defined using std::simd with the gcd(size, llvm_align) proposal.)

We are strongly inclined to think we do not want byte-level padding in SIMD types,

so then you also agree we should change rustc's current vector layouts to never introduce padding bytes. this requires potentially reducing alignment for non-power-of-2 vectors in order to not violate the rust size % align == 0 rule.

programmerjake · 2022-12-04T02:37:20Z

note that my proposal means that slice::as_simd can never panic, currently it has to verify that the provided vector type has no padding and panics if that fails, which is a major footgun.
https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_simd

workingjubilee · 2022-12-04T02:51:40Z

...That's not really a footgun, honestly, unless you write code that never actually tests the function ever, since the result must be monomorphized.

calebzulawski · 2022-12-04T02:57:16Z

In our docs we already repeatedly describe vectors as "basically arrays" so I do think it's a footgun if they have unusual layouts, in this function and elsewhere.

programmerjake · 2022-12-04T02:57:22Z

...That's not really a footgun, honestly, unless you write code that never actually tests the function ever, since the result must be monomorphized.

maybe not in the sense of unintentionally causing UB, but it is in the sense of writing a bunch of code generic over vector lengths and expecting it to work for any length...which could require re-architecting a bunch of code to fix.

RalfJung · 2022-12-04T09:06:19Z

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

It is not actually my opinion that these values should be equal in all cases, actually? I believe f32x4 on x86 should be natively 16-byte aligned. Unless you are talking about f32x3 specifically.

You were quoting a statement about the size but your reply talks about alignment. Looks to me like you are talking past each other?

f32x4 on x86 can be 16-byte aligned and still have the same size as [f32; 4] without any problem.

RalfJung · 2022-12-04T09:09:02Z

power-of-2 sized vectors always matched their corresponding array size and my proposal doesn't change that. if we use the gcd(size, llvm_align) proposal then power-of-2 vectors' alignments should be entirely unchanged too (assuming there isn't a weird arch that decides i8x4 has align 16 or something). my proposal only changes non-power-of-2 vectors.

Usually alignment of such types is not even something we can decide. What is their alignment in C? We pretty much have to use the same to be ABI compatible.

bjorn3 · 2022-12-04T09:37:58Z

Only the types in core::arch need to be abi compatible right? There are From impls to convert between the core::arch types and core::simd::Simd, but nothing that converts references for one type to references for the other type.

RalfJung · 2022-12-04T10:35:10Z

That is assuming people won't declare their own SIMD types and expect them to be compatible with the equivalent def.n in C.

programmerjake · 2022-12-04T10:42:10Z

Usually alignment of such types is not even something we can decide. What is their alignment in C? We pretty much have to use the same to be ABI compatible.

imho Simd<T, N> is not the same type as existing C simd types, it's something new, defined by Rust, so ABI compatibility isn't an issue, unless we want it to be, in which case we can provide same-layout guarantees for some types e.g. we might want to define Simd<u32, 4> on x86_64 to have the same layout as __m128, and to define Simd<i32, 5> to not match any corresponding std::arch::x86_64 type.

RalfJung · 2022-12-04T10:44:02Z

Fair. But make sure this is documented well and brace yourselves for bug reports caused by any difference in ABI. ;)

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

programmerjake · 2022-12-04T10:47:34Z

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

for x86_64, yes, but only in clang, not gcc (idk about icc or msvc):

typedef unsigned short u16x3 __attribute__((vector_size(3 * sizeof(unsigned short))));

workingjubilee · 2022-12-04T10:50:41Z

Fair. But make sure this is documented well and brace yourselves for bug reports caused by any difference in ABI. ;)

our 128-bit integers don't even match up with C!

Does C even have non-pot SIMD types, and if yes then how do their layouts look like?

"Ad hoc." As @eddyb mentioned/implied, f32x3 is weird and special in most C implementations of it, if there are any (and there are some, due to code attempting to cross the C<->GLSL boundary), such that it traumatizes a lot of C programmers enough they don't want to touch it. People won't be that surprised that yet another implementation has decided on yet another quirky variant, though they may be annoyed.

RalfJung · 2022-12-04T10:51:34Z

our 128-bit integers don't even match up with C!

And that is such a success that you want to repeat it? 😂

RalfJung · 2022-12-05T09:48:36Z

I'm out of my depth here, I don't understand the tradeoffs involved well enough to judge the pros and cons. I guess instead of having the same logic in many places we could have a shared helper on Abi that distinguishes "register type" and "memory type", but no idea how practical that is.

programmerjake · 2022-12-05T11:18:25Z

I'm out of my depth here, I don't understand the tradeoffs involved well enough to judge the pros and cons. I guess instead of having the same logic in many places we could have a shared helper on Abi that distinguishes "register type" and "memory type", but no idea how practical that is.

imho it would be useful to also have a load/store type (value actually being passed to the load/store instruction):
comparison:

bool

memory type: i8
register type: i1
store type: i8

%bool_ptr = getelementptr i8, ptr %bool_array_ptr, i64 %idx ; compute pointer using memory type
%v = zext i1 %bool_value to i8 ; convert register type to store type
store i8 %v, ptr %bool_ptr ; store using store type

Simd<i32, 3>

memory type: [3 x i32]
register type: <3 x i32>
store type: <3 x i32> with load/store ops' align being explicitly specified

%vec_ptr = getelementptr [3 x i32], ptr %vec_array_ptr, i64 %idx ; compute pointer using memory type
; register type is store type, no conversion needed
store <3 x i32> %vec_value, ptr %vec_ptr, align 4 ; store using store type with alignment overridden to rust type alignment

bjorn3 · 2022-12-05T11:48:55Z

The existing handling for handling bools being i1 in registers touches a lot of code (including seemingly unrelated code like determining signs for tags in the discriminant calculation code) and requires separate "immediate" and non-immediate versions of several methods. I am pretty sure the wrong one is used at several places. Also for Cranelift it did require extra special cases to handle "i1" being 8 bits big as Cranelift only has i8 and up. I believe GCC also requires special code as it uses a true bool type rather than a 1bit integer. Honestly I did prefer to just convert i1 to i8 as soon as possible and never pass it around as immediate. I realize this may not be possible for performance reasons though.

programmerjake · 2023-01-05T11:52:16Z

somewhat related: https://discourse.llvm.org/t/status-of-geps-into-vectors-of-overaligned-elements/67497/4?u=programmerjake

programmerjake · 2023-02-24T00:01:11Z

note that the alignment specification I'm proposing will allow arbitrary vector type punning without having alignment issues as long as the type pun is otherwise valid and llvm_align doesn't do crazy stuff (e.g. if llvm decides alignof(i8x4) is less than alignof(i32x1) for some strange reason then that prevents type punning i32x1 to i8x4): see riscv-non-isa/riscv-elf-psabi-doc#347 (comment) for full explanation

calebzulawski · 2023-09-24T23:27:29Z

Considering this practically rather than idealistically for a moment, it's questionable to me whether it's really worth fixing the layout vs considering other options. bool is perhaps the most frequently used primitive, whereas std::simd is niche, and non-PoT vectors considerably more niche. Embedding a split memory/register repr throughout the compiler seems impractical and even if we decided that was the best option, I genuinely doubt anyone would ever implement it.

That said, not supporting non-PoT vectors could excessively complicate our API and perhaps delay stabilization while we try to sort out a simpler alternative to SupportedLaneCount (see #364). I think it might be worth exploring other options that would give us non-PoT vectors in a more practical way. I've thought of 3 alternatives:

Leave things as they are. The current implementation of non-PoT vectors works. We could leave them be and use either an assert or non-local error in functions like [T]::as_simd. This would leave us with the option of reducing the alignment in the future (maybe across an edition change).
Add a new CompactSimd or UnalignedSimd type. This type could implement all of the same functions and traits as Simd, but have a lesser alignment, either repr(Rust) or repr(compact_simd). To avoid a split memory/register repr, I would not make the type a vector type, but instead convert it to a repr(simd) type before passing it to the existing simd_* intrinsics. We could easily implement Into<Simd> for CompactSimd and perhaps Deref<Target = UnalignedSimd> for Simd. [T]::as_simd could still error, but have a matching [T]::as_compact_simd that works for any number of elements.
Make Simd repr(compact_simd) and convert before using. Like the previous example, but wrapped into one type. We could add new intrinsics simd_align and simd_unalign that convert between repr(simd) and repr(compact_simd), a no-op when alignments match. In this case, repr(compact_simd) would be a glorified repr(align(N)).

Personally, I like the first or second alternative best. LLVM seems to be able to generate pretty good code for non-PoT vectors, and I think messing with the alignment could cause problems. The second option would allow the user to choose which is more important--having a flexible memory layout or reliable codegen that doesn't juggle layouts.

cc @programmerjake @scottmcm

…rkingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

…workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

…rkingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

…nsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

…rinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

Rollup merge of rust-lang#125311 - calebzulawski:repr-packed-simd-intrinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

…r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In #117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

programmerjake added the C-bug Category: Bug label Nov 30, 2022

eddyb mentioned this issue Dec 5, 2022

Codegen backends should validate their assumed encodings of e.g. rustc layouts. rust-lang/rust#105282

Open

This was referenced Feb 19, 2023

Add copy_to_slice #331

Merged

ABI alignment of vector types riscv-non-isa/riscv-elf-psabi-doc#347

Closed

programmerjake mentioned this issue Mar 13, 2023

Document alternatives to Mask<T, LANES> that guarantee layout. Fixes #332 #333

Open

programmerjake mentioned this issue Apr 23, 2023

don't use read_unaligned to read from an array casted to *const Simd #341

Open

programmerjake mentioned this issue Sep 24, 2023

Critical issues before stabilization #364

Open

calebzulawski mentioned this issue Oct 24, 2023

Implement repr(packed) for repr(simd) rust-lang/rust#117116

Merged

RalfJung referenced this issue Apr 6, 2024

add all_lane_counts feature to enable non-power-of-2 lane counts <= 64

8742a86

RalfJung mentioned this issue Apr 6, 2024

Miri ICEs on non-power-of-2 non-packed SIMD vectors rust-lang/miri#3458

Closed

RalfJung mentioned this issue Apr 9, 2024

do not run the 4-lane tests in Miri #412

Merged

calebzulawski mentioned this issue May 20, 2024

Make repr(packed) vectors work with SIMD intrinsics rust-lang/rust#125311

Merged

calebzulawski mentioned this issue Jun 3, 2024

Fix layout of non-power-of-two length vectors #422

Merged

calebzulawski closed this as completed in #422 Aug 13, 2024

non-power-of-2 Simd types have wrong size #319

non-power-of-2 Simd types have wrong size #319

Comments

programmerjake commented Nov 30, 2022

Meta

programmerjake commented Nov 30, 2022

programmerjake commented Nov 30, 2022

programmerjake commented Nov 30, 2022 • edited Loading

programmerjake commented Dec 1, 2022

programmerjake commented Dec 1, 2022

calebzulawski commented Dec 1, 2022

programmerjake commented Dec 1, 2022

RalfJung commented Dec 2, 2022 via email

RalfJung commented Dec 2, 2022 • edited Loading

eddyb commented Dec 2, 2022

eddyb commented Dec 2, 2022

programmerjake commented Dec 2, 2022

programmerjake commented Dec 2, 2022

programmerjake commented Dec 2, 2022

workingjubilee commented Dec 4, 2022

programmerjake commented Dec 4, 2022

programmerjake commented Dec 4, 2022

workingjubilee commented Dec 4, 2022

calebzulawski commented Dec 4, 2022

programmerjake commented Dec 4, 2022

RalfJung commented Dec 4, 2022 • edited Loading

RalfJung commented Dec 4, 2022

bjorn3 commented Dec 4, 2022

RalfJung commented Dec 4, 2022

programmerjake commented Dec 4, 2022

RalfJung commented Dec 4, 2022

programmerjake commented Dec 4, 2022

workingjubilee commented Dec 4, 2022

RalfJung commented Dec 4, 2022

RalfJung commented Dec 5, 2022 via email

programmerjake commented Dec 5, 2022

bjorn3 commented Dec 5, 2022

programmerjake commented Jan 5, 2023

programmerjake commented Feb 24, 2023 • edited Loading

calebzulawski commented Sep 24, 2023 • edited Loading

programmerjake commented Nov 30, 2022 •

edited

Loading

RalfJung commented Dec 2, 2022 •

edited

Loading

RalfJung commented Dec 4, 2022 •

edited

Loading

programmerjake commented Feb 24, 2023 •

edited

Loading

calebzulawski commented Sep 24, 2023 •

edited

Loading