Skip to content

Add support for unaligned simd masked load/store #126919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sayantn opened this issue Jun 24, 2024 · 33 comments
Closed

Add support for unaligned simd masked load/store #126919

sayantn opened this issue Jun 24, 2024 · 33 comments
Labels
A-intrinsics Area: Intrinsics A-SIMD Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd)

Comments

@sayantn
Copy link
Contributor

sayantn commented Jun 24, 2024

We currently have only an aligned version of simd_masked_load, but the underlying llvm intrinsic doesn't have any alignment requirement. So I propose to add simd_masked_load_unaligned and simd_masked_store_unaligned. These will link with the llvm intrinsic with an alignment of 8 bits (the lowest permissible in Rust)

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-intrinsics Area: Intrinsics A-SIMD Area: SIMD (Single Instruction Multiple Data) and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 24, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Jun 24, 2024

The alignment required is only the alignment of the T for the Simd<T, N>.

@workingjubilee workingjubilee added the PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) label Jun 24, 2024
@workingjubilee
Copy link
Member

Do you have a use-case in mind?

@workingjubilee
Copy link
Member

workingjubilee commented Jun 25, 2024

Oh, I see. It seems this is about rust-lang/stdarch#1594 yes? Except you have underaligned loads and stores, also!

I think I'd rather see us pass the alignment as a const generic parameter if we want to go down this road, then we can pass 1 or whatever.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

Yes, this is for stdarch. There are a lot of unaligned masked load/store intrinsics, and they all resort to using inline asm.

@workingjubilee
Copy link
Member

workingjubilee commented Jun 25, 2024

the aligned ones also, thus my remark about a generic alignment. (at least, I hope emitting the correct alignment would encourage LLVM to lower things correctly...)

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

Even if we add a generic alignment parameter, most if not all use cases will be fully-aligned or fully-unaligned. Also, having a random ::<1> in the middle of simd code will be somewhat confusing

@workingjubilee
Copy link
Member

workingjubilee commented Jun 25, 2024

For who? Standard library developers?

I agree it doesn't have many possible values though.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

For any contributor, it may be confusing. I think stdarch code should be as clear as possible to avoid possible bugs

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

The only practical values are 1, 2, 4 and 8

@workingjubilee
Copy link
Member

workingjubilee commented Jun 25, 2024

The only practical values are 1, 2, 4 and 8

Not if you want to emit vmovaps. That puts 16, 32, and 64 on the table.

And I understand that desire, but I also think that the compiler code implementing these intrinsics should be as clear as possible to avoid possible bugs.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

Oh, yeah. I misread the doc. Then it should be a const generic parameter, as it has quite a few possible values (1 for movu, 16/32/64 for mova, 1/2/4/8/16 for portable simd)

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2024

Should we introduce a new function with an alignment const generic parameter, or should we modify the existing masked load/store functions to take an extra const generic parameter (breaking change)?

@workingjubilee
Copy link
Member

I would prefer the latter if possible.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 26, 2024

It is possible to do, we just need to grep stdlib for its uses (I don't think there are any uses except for portable simd). In portable simd, we need something like simd_masked_load::<{ mem:: align_of::<T> }>(mask, ptr, val), this just needs #[feature(generic_const_exprs)]. Even outside of portable simd, there should not be many uses of these, since they were added only last December.

@workingjubilee
Copy link
Member

oh, darn.
we can't rely on generic_const_exprs, sadly.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 26, 2024

We can also use inline consts and a normal alignment parameter, like used in simd_shuffle as a shuffle parameter

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

We can also use inline consts and a normal alignment parameter, like used in simd_shuffle as a shuffle parameter

That's an ad-hoc hack and something we'd rather not see more of.

@sayantn
Copy link
Contributor Author

sayantn commented Jul 4, 2024

The alternative, using const generics, will need generic_const_exprs in portable-simd.

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

Yeah it may just currently not be possible to do this, then.

Why does alignment need to be a const parameter anyway? Usually in Rust the alignment is given by the type.

@sayantn
Copy link
Contributor Author

sayantn commented Jul 4, 2024

llvm needs it to be an immediate value. the alignment is for the pointer. if the pointer is 16-byte aligned and we are loading a 128-bit value, llvm can generate an aligned load, but not if it is not 16-byte aligned. in this case the element type may be u8 or whatever

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

Elsewhere in Rust people encode that with types just fine, e.g. via #[repr(align(N))]. I don't think I understand why it does not work here.

@sayantn
Copy link
Contributor Author

sayantn commented Jul 4, 2024

by default, these intrinsics use the alignment of the element type. and that may not be the desired alignment of the pointer. we may have a u8x16, then the intrinsic will assume that the pointer is byte-aligned. but for aligned simd loads, the pointer is 128-bit aligned. even bad scenario, we need to load from a byte-aligned pointer into a u32x4. it is not even possible using this intrinsic

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

So should the definition of u8x16 be changed to require 16-byte alignment?

even bad scenario, we need to load from a byte-aligned pointer into a u32x4. it is not even possible using this intrinsic

We have read_unaligned for that. You'll need a raw pointer here anyway, unaligned references are UB (even if you never load from them).

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

Ah wait this is about masked access, so I guess read_unaligned won't work?

In general Rust doesn't support having over/underaligned references. It's either "use standard alignment" or "use raw pointer". There's clearly a gap here, supporting explicit alignment annotations would be nice, but I don't think it makes sense for the SIMD types to be the part that tries to work out a solution -- this needs a language-side solution.

Yes, this is for stdarch. There are a lot of unaligned masked load/store intrinsics, and they all resort to using inline asm.

Until const generics get better, using inline asm seems like not the worst solution to me...

@sayantn
Copy link
Contributor Author

sayantn commented Jul 4, 2024

We don't need to do much, the LLVM intrinsic already has an alignment parameter. For masked access, a reference doesn't really make sense, as the point of masked loads, instead of load-then-select is that if you don't have permission to load from the masked-out positions, it should work just fine. And a reference is always safe to dereference, in which case we can just use load-then-select

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2024

Yeah sorry I had forgotten that this was about masked accesses.

We don't need to do much,

Not much, just finish const generics? ;)

You are asking for quite a lot here. Compile-time constants passed as parameters to functions are a non-trivial topic. So "don't need to do much" is not accurate I am afraid.

@workingjubilee
Copy link
Member

Hmm, I somewhat wonder if the proper direction would be supporting more ergonomic ways to specify the alignment of types...

@sayantn
Copy link
Contributor Author

sayantn commented Jul 5, 2024

Why does portable_simd even use the alignment of the element type? It is an awkward position between traditionally "aligned" (to the vector type) and "unaligned" (byte-aligned) simd load/stores. We can avoid the problem of const generics simply by modifying the existing intrinsic to use the alignment of the vector type, and adding a new intrinsic that generates byte-aligned load/stores. Or, we can add a const generic boolean ALIGNED parameter.

@sayantn sayantn closed this as completed Sep 4, 2024
@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2024 via email

@sayantn
Copy link
Contributor Author

sayantn commented Sep 4, 2024

This is no longer needed in stdarch because we used some llvm intrinsics directly, and as this issue was opened because of stdarch I thought we don't need it anymore

@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2024

Would be good to cross-reference that change with this issue, so that in the future someone can figure out what happened.

@sayantn
Copy link
Contributor Author

sayantn commented Sep 4, 2024

rust-lang/stdarch#1613 eliminated the need for this feature in strarch

@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2024

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intrinsics Area: Intrinsics A-SIMD Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd)
Projects
None yet
Development

No branches or pull requests

4 participants