Add alignment parameter to `simd_masked_{load,store}` #147355

sayantn · 2025-10-04T23:41:43Z

This PR adds an alignment parameter in simd_masked_load and simd_masked_store, in the form of a const-generic enum core::intrinsics::simd::SimdAlign. This represents the alignment of the ptr argument in these intrinsics as follows

SimdAlign::Unaligned - ptr is unaligned/1-byte aligned
SimdAlign::Element - ptr is aligned to the element type of the SIMD vector (default behavior in the old signature)
SimdAlign::Vector - ptr is aligned to the SIMD vector type

The main motive for this is stdarch - most vector loads are either fully aligned (to the vector size) or unaligned (byte-aligned), so the previous signature doesn't cut it.

Now, stdarch will mostly use SimdAlign::Unaligned and SimdAlign::Vector, whereas portable-simd will use SimdAlign::Element.

cg_llvm
cg_clif
miri/const_eval

Alternatives

Using a const-generic/"const" u32 parameter as alignment (and we error during codegen if this argument is not a power of two). This, although more flexible than this, has a few drawbacks

If we use an const-generic argument, then portable-simd somehow needs to pass align_of::<T>() as the alignment, which isn't possible without GCE
"const" function parameters are just an ugly hack, and a pain to deal with in non-LLVM backends

We can remedy the problem with the const-generic u32 parameter by adding a special rule for the element alignment case (e.g. 0 can mean "use the alignment of the element type), but I feel like this is not as expressive as the enum approach, although I am open to suggestions

cc @workingjubilee @RalfJung @BoxyUwU

rustbot · 2025-10-04T23:41:46Z

The Miri subtree was changed

cc @rust-lang/miri

Portable SIMD is developed in its own repository. If possible, consider making this change to rust-lang/portable-simd instead.

cc @calebzulawski, @programmerjake

Some changes occurred to the platform-builtins intrinsics. Make sure the
LLVM backend as well as portable-simd gets adapted for the changes.

cc @antoyo, @GuillaumeGomez, @bjorn3, @calebzulawski, @programmerjake

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

rustbot · 2025-10-04T23:41:48Z

r? @lcnr

rustbot has assigned @lcnr.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

src/tools/miri/tests/pass/intrinsics/portable-simd.rs

compiler/rustc_codegen_llvm/src/intrinsic.rs

tests/codegen-llvm/simd-intrinsic/simd-intrinsic-generic-masked-store.rs

RalfJung · 2025-10-05T09:09:26Z

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

compiler/rustc_codegen_llvm/src/intrinsic.rs

programmerjake · 2025-10-05T09:56:25Z

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

RalfJung · 2025-10-05T11:41:04Z

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

Yes, so...?

IIUC we either want element-level alignment or no alignment, so we can just have a const bool generic controlling that.

programmerjake · 2025-10-05T12:21:22Z

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

Yes, so...?

I thought you meant full-simd-type alignment or unaligned, since that's what x86 uses for simd instructions it calls aligned.

RalfJung · 2025-10-05T12:45:04Z

This is about simd_masked_load/store which are currently documented as

/// Unmasked values in `T` must be readable as if by `<ptr>::read` (e.g. aligned to the element
/// type).

sayantn · 2025-10-05T12:47:49Z

As a summary, we need 3 types of alignments

element alignment (used in portable simd)
simd type alignment (x86 aligned)
fully unaligned (x86 unaligned)

So a bool flag won't cut it, at best we can use a const generic parameter, with 0 meaning element size aligned (because that is the most used, and can't be specified using const generics (requires gce))

RalfJung · 2025-10-05T12:53:19Z

As a summary, we need 3 types of alignments

Now you are expanding the scope of the PR. So far the motivation has been, we'd like an unaligned version of the existing intrinsics. If you also want SIMD type aligned variants, the PR description needs to be expanded to argue for this.

IIRC, last time this was looked into, the SIMD type alignment option wasn't necessary -- LLVM was more than able to use surrounding info on reference types to deduce the right alignment for the desired codegen. So please show some concrete undesirable codegen if you want to motivate a form of this intrinsic that requires SIMD type alignment.

sayantn · 2025-10-05T13:06:14Z

I apologise if I was unclear, but the motivation was always adding these 3 types of loads. LLVM will always generate an unaligned (byte-aligned) load/store if we pass any alignment less that the vector type size (because it is guaranteed to be safe). But for _mm_mask_load_ intrinsics, the pointer needs to be aligned to the vector size, so LLVM won't generate aligned loads unless we pass the vector size as alignment.

RalfJung · 2025-10-05T13:31:38Z

LLVM will always generate an unaligned (byte-aligned) load/store if we pass any alignment less that the vector type size

I don't think the "always" here is correct. If we are loading from an &__m128 LLVM should be able to use that fact to generate the aligned intrinsics.

However, I guess stdarch uses raw pointers in its API. So yeah this definitely needs to be explained properly in the PR description, currently it is at best confusing.

RalfJung · 2025-10-05T13:36:39Z

If we need 3 different alignment modes (Cc @Amanieu for the stdarch part here), that can still be done using const generics with a new 3-variant enum (similar to the enum we have for atomic memory access orderings).

workingjubilee · 2025-10-05T17:44:43Z

Yes, std::arch tries to reflect the type signatures used by the C vendor functions... it's not exactly just "bindgen for vendor functions", but it kinda is bindgen for vendor functions.

Amanieu · 2025-10-05T22:39:51Z

I'm happy with the current API that takes a constant (either as an argument or a const generic). An enum doesn't really provide much of an advantage when the desired alignment can just be explicitly provided.

sayantn · 2025-10-06T03:16:45Z

If everyone agrees, I can substitute the const argument for a const-generic u32 alignment. But then portable-simd would face problems because it has to pass align_of::<T> to the const-generic argument somehow, so I propose to add a special meaning to 0 - if 0 is passed as the alignment it is interpreted as the element type's alignment. This doesn't affect stdarch, all invocations of this intrinsic there will use literal (for x86 at least, I don't know much about other archs)

RalfJung · 2025-10-06T07:09:17Z

I'm happy with the current API that takes a constant (either as an argument or a const generic). An enum doesn't really provide much of an advantage when the desired alignment can just be explicitly provided.

The enum provides the big advantage that we don't need more ad-hock "constant argument" hacks.

What I was hoping to get from you is confirmation on which forms of the intrinsic are needed for stdarch.

RalfJung · 2025-10-06T07:10:43Z

If everyone agrees, I can substitute the const argument for a const-generic u32 alignment. But then portable-simd would face problems because it has to pass align_of::<T> to the const-generic argument somehow, so I propose to add a special meaning to 0 - if 0 is passed as the alignment it is interpreted as the element type's alignment. This doesn't affect stdarch, all invocations of this intrinsic there will use literal (for x86 at least, I don't know much about other archs)

I would prefer that over the "constant argument" hack. Not sure if it's better than a 3-value enum but 🤷 .

lcnr · 2025-10-06T13:35:11Z

r? @RalfJung though feel free to reassigned

compiler/rustc_const_eval/src/interpret/intrinsics/simd.rs

bors · 2025-11-03T20:51:32Z

☔ The latest upstream changes (presumably #148446) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot · 2025-11-03T21:33:42Z

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

sayantn · 2025-11-03T23:12:30Z

@rustbot ready

RalfJung · 2025-11-04T20:38:10Z

Seeing as @bjorn3 seems happy with the cranelift changes
@bors r=RalfJung,bjorn3

bors · 2025-11-04T20:38:14Z

📌 Commit 21fb801 has been approved by RalfJung,bjorn3

It is now in the queue for this repository.

Rollup of 6 pull requests Successful merges: - #147355 (Add alignment parameter to `simd_masked_{load,store}`) - #147925 (Fix tests for big-endian) - #148341 (compiler: Fix a couple issues around cargo feature unification) - #148371 (Dogfood `trim_{suffix|prefix}` in compiler) - #148495 (Implement Path::is_empty) - #148502 (rustc-dev-guide subtree update) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #147355 - sayantn:masked-loads, r=RalfJung,bjorn3 Add alignment parameter to `simd_masked_{load,store}` This PR adds an alignment parameter in `simd_masked_load` and `simd_masked_store`, in the form of a const-generic enum `core::intrinsics::simd::SimdAlign`. This represents the alignment of the `ptr` argument in these intrinsics as follows - `SimdAlign::Unaligned` - `ptr` is unaligned/1-byte aligned - `SimdAlign::Element` - `ptr` is aligned to the element type of the SIMD vector (default behavior in the old signature) - `SimdAlign::Vector` - `ptr` is aligned to the SIMD vector type The main motive for this is stdarch - most vector loads are either fully aligned (to the vector size) or unaligned (byte-aligned), so the previous signature doesn't cut it. Now, stdarch will mostly use `SimdAlign::Unaligned` and `SimdAlign::Vector`, whereas portable-simd will use `SimdAlign::Element`. - [x] `cg_llvm` - [x] `cg_clif` - [x] `miri`/`const_eval` ## Alternatives Using a const-generic/"const" `u32` parameter as alignment (and we error during codegen if this argument is not a power of two). This, although more flexible than this, has a few drawbacks - If we use an const-generic argument, then portable-simd somehow needs to pass `align_of::<T>()` as the alignment, which isn't possible without GCE - "const" function parameters are just an ugly hack, and a pain to deal with in non-LLVM backends We can remedy the problem with the const-generic `u32` parameter by adding a special rule for the element alignment case (e.g. `0` can mean "use the alignment of the element type), but I feel like this is not as expressive as the enum approach, although I am open to suggestions cc `@workingjubilee` `@RalfJung` `@BoxyUwU`

Rollup of 6 pull requests Successful merges: - rust-lang/rust#147355 (Add alignment parameter to `simd_masked_{load,store}`) - rust-lang/rust#147925 (Fix tests for big-endian) - rust-lang/rust#148341 (compiler: Fix a couple issues around cargo feature unification) - rust-lang/rust#148371 (Dogfood `trim_{suffix|prefix}` in compiler) - rust-lang/rust#148495 (Implement Path::is_empty) - rust-lang/rust#148502 (rustc-dev-guide subtree update) r? `@ghost` `@rustbot` modify labels: rollup

bjorn3 · 2025-11-08T13:32:34Z

compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs

+            let memflags = match alignment {
+                SimdAlign::Unaligned => MemFlags::new().with_notrap(),
+                _ => MemFlags::trusted(),
+            };


Looks like you accidentally added this to simd_gather rather than simd_masked_load.
Edit: Fixed in rust-lang/rustc_codegen_cranelift@a0b865d

Oh shoot, sorry for the inconvenience.

Rollup of 6 pull requests Successful merges: - rust-lang/rust#147355 (Add alignment parameter to `simd_masked_{load,store}`) - rust-lang/rust#147925 (Fix tests for big-endian) - rust-lang/rust#148341 (compiler: Fix a couple issues around cargo feature unification) - rust-lang/rust#148371 (Dogfood `trim_{suffix|prefix}` in compiler) - rust-lang/rust#148495 (Implement Path::is_empty) - rust-lang/rust#148502 (rustc-dev-guide subtree update) r? `@ghost` `@rustbot` modify labels: rollup

rustbot assigned lcnr Oct 4, 2025

This comment has been minimized.

Sign in to view

sayantn force-pushed the masked-loads branch from 4cec11b to 9afbfb1 Compare October 5, 2025 00:27

programmerjake reviewed Oct 5, 2025

View reviewed changes

src/tools/miri/tests/pass/intrinsics/portable-simd.rs Outdated Show resolved Hide resolved

programmerjake reviewed Oct 5, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

programmerjake reviewed Oct 5, 2025

View reviewed changes

tests/codegen-llvm/simd-intrinsic/simd-intrinsic-generic-masked-store.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Oct 5, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

rustbot assigned RalfJung Oct 6, 2025

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 2, 2025

sayantn force-pushed the masked-loads branch from 471695b to 56864a2 Compare November 3, 2025 07:01

RalfJung reviewed Nov 3, 2025

View reviewed changes

compiler/rustc_const_eval/src/interpret/intrinsics/simd.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

sayantn force-pushed the masked-loads branch from 56864a2 to 9f7f406 Compare November 3, 2025 20:59

sayantn added 3 commits November 4, 2025 02:30

Add alignment parameter to simd_masked_{load,store}

75de619

Add implementation of the alignment parameter in Miri

ffe6cf6

Implement the alignment parameter in cg_clif

21fb801

sayantn force-pushed the masked-loads branch from 9f7f406 to 21fb801 Compare November 3, 2025 21:33

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 3, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2025

Zalathar mentioned this pull request Nov 4, 2025

Rollup of 6 pull requests #148507

Merged

bors merged commit c33d51b into rust-lang:master Nov 5, 2025
11 checks passed

rustbot added this to the 1.93.0 milestone Nov 5, 2025

sayantn mentioned this pull request Nov 5, 2025

Use generic SIMD masked load/stores for avx512 masked load/stores rust-lang/stdarch#1953

Open

sayantn deleted the masked-loads branch November 8, 2025 07:22

bjorn3 reviewed Nov 8, 2025

View reviewed changes

Add alignment parameter to simd_masked_{load,store} #147355

Add alignment parameter to simd_masked_{load,store} #147355

Conversation

sayantn commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternatives

Uh oh!

rustbot commented Oct 4, 2025

Uh oh!

rustbot commented Oct 4, 2025

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Oct 5, 2025

Uh oh!

Uh oh!

programmerjake commented Oct 5, 2025

Uh oh!

RalfJung commented Oct 5, 2025

Uh oh!

programmerjake commented Oct 5, 2025

Uh oh!

RalfJung commented Oct 5, 2025

Uh oh!

sayantn commented Oct 5, 2025

Uh oh!

RalfJung commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Oct 5, 2025

Uh oh!

RalfJung commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Oct 5, 2025

Uh oh!

workingjubilee commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amanieu commented Oct 5, 2025

Uh oh!

sayantn commented Oct 6, 2025

Uh oh!

RalfJung commented Oct 6, 2025

Uh oh!

RalfJung commented Oct 6, 2025

Uh oh!

lcnr commented Oct 6, 2025

Uh oh!

Uh oh!

This comment has been minimized.

bors commented Nov 3, 2025

Uh oh!

rustbot commented Nov 3, 2025

Uh oh!

sayantn commented Nov 3, 2025

Uh oh!

RalfJung commented Nov 4, 2025

Uh oh!

bors commented Nov 4, 2025

Uh oh!

Uh oh!

bjorn3 Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayantn Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Add alignment parameter to `simd_masked_{load,store}` #147355

Add alignment parameter to `simd_masked_{load,store}` #147355

sayantn commented Oct 4, 2025 •

edited

Loading

RalfJung commented Oct 5, 2025 •

edited

Loading

RalfJung commented Oct 5, 2025 •

edited

Loading

workingjubilee commented Oct 5, 2025 •

edited

Loading

bjorn3 Nov 8, 2025 •

edited

Loading