[Clang] Change masked load / store builtin interface to take scalar pointer #156063

jhuber6 · 2025-08-29T17:15:03Z

Summary:
Right now these enformce alignment, which isn't convenient for the user
on platforms that support unaligned accesses. The options are to either
permit passing the alignment manually, or just assume it's unaligned
unless the user specifies it.

I've added #156057 which should
make the requiested alignment show up on the intrinsic if the user
passed __builtin_assume_aligned, however that's only with
optimizations. This shouldn't cause issues unless the backend
categorically decides to reject an unaligned access.

llvmbot · 2025-08-29T17:15:34Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Joseph Huber (jhuber6)

Changes

Summary:
Right now these enformce alignment, which isn't convenient for the user
on platforms that support unaligned accesses. The options are to either
permit passing the alignment manually, or just assume it's unaligned
unless the user specifies it.

I've added #156057 which should
make the requiested alignment show up on the intrinsic if the user
passed __builtin_assume_aligned, however that's only with
optimizations. This shouldn't cause issues unless the backend
categorically decides to reject an unaligned access.

Full diff: https://github.com/llvm/llvm-project/pull/156063.diff

3 Files Affected:

(modified) clang/docs/LanguageExtensions.rst (+2-1)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+5-11)
(modified) clang/test/CodeGen/builtin-masked.c (+3-3)

diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index cbe59124d5b99..3c280c7ea3a1d 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -948,7 +948,8 @@ Each builtin accesses memory according to a provided boolean mask. These are
 provided as ``__builtin_masked_load`` and ``__builtin_masked_store``. The first
 argument is always boolean mask vector. The ``__builtin_masked_load`` builtin
 takes an optional third vector argument that will be used for the result of the
-masked-off lanes. These builtins assume the memory is always aligned.
+masked-off lanes. These builtins assume the memory is unaligned, use
+``__builtin_assume_aligned`` if alignment is desired.
 
 The ``__builtin_masked_expand_load`` and ``__builtin_masked_compress_store``
 builtins have the same interface but store the result in consecutive indices.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 172a521e63c17..b00b9dddd75fb 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -4277,10 +4277,6 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     llvm::Value *Ptr = EmitScalarExpr(E->getArg(1));
 
     llvm::Type *RetTy = CGM.getTypes().ConvertType(E->getType());
-    CharUnits Align = CGM.getNaturalTypeAlignment(E->getType(), nullptr);
-    llvm::Value *AlignVal =
-        llvm::ConstantInt::get(Int32Ty, Align.getQuantity());
-
     llvm::Value *PassThru = llvm::PoisonValue::get(RetTy);
     if (E->getNumArgs() > 2)
       PassThru = EmitScalarExpr(E->getArg(2));
@@ -4289,8 +4285,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     if (BuiltinID == Builtin::BI__builtin_masked_load) {
       Function *F =
           CGM.getIntrinsic(Intrinsic::masked_load, {RetTy, UnqualPtrTy});
-      Result =
-          Builder.CreateCall(F, {Ptr, AlignVal, Mask, PassThru}, "masked_load");
+      Result = Builder.CreateCall(
+          F, {Ptr, llvm::ConstantInt::get(Int32Ty, 1), Mask, PassThru},
+          "masked_load");
     } else {
       Function *F = CGM.getIntrinsic(Intrinsic::masked_expandload, {RetTy});
       Result =
@@ -4308,14 +4305,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     llvm::Type *ValLLTy = CGM.getTypes().ConvertType(ValTy);
     llvm::Type *PtrTy = Ptr->getType();
 
-    CharUnits Align = CGM.getNaturalTypeAlignment(ValTy, nullptr);
-    llvm::Value *AlignVal =
-        llvm::ConstantInt::get(Int32Ty, Align.getQuantity());
-
     if (BuiltinID == Builtin::BI__builtin_masked_store) {
       llvm::Function *F =
           CGM.getIntrinsic(llvm::Intrinsic::masked_store, {ValLLTy, PtrTy});
-      Builder.CreateCall(F, {Val, Ptr, AlignVal, Mask});
+      Builder.CreateCall(F,
+                         {Val, Ptr, llvm::ConstantInt::get(Int32Ty, 1), Mask});
     } else {
       llvm::Function *F =
           CGM.getIntrinsic(llvm::Intrinsic::masked_compressstore, {ValLLTy});
diff --git a/clang/test/CodeGen/builtin-masked.c b/clang/test/CodeGen/builtin-masked.c
index 579cf5c413c9b..1e0f63d280840 100644
--- a/clang/test/CodeGen/builtin-masked.c
+++ b/clang/test/CodeGen/builtin-masked.c
@@ -19,7 +19,7 @@ typedef _Bool v8b __attribute__((ext_vector_type(8)));
 // CHECK-NEXT:    [[LOAD_BITS2:%.*]] = load i8, ptr [[M_ADDR]], align 1
 // CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
 // CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[P_ADDR]], align 8
-// CHECK-NEXT:    [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP2]], i32 32, <8 x i1> [[TMP1]], <8 x i32> poison)
+// CHECK-NEXT:    [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP2]], i32 1, <8 x i1> [[TMP1]], <8 x i32> poison)
 // CHECK-NEXT:    ret <8 x i32> [[MASKED_LOAD]]
 //
 v8i test_load(v8b m, v8i *p) {
@@ -45,7 +45,7 @@ v8i test_load(v8b m, v8i *p) {
 // CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
 // CHECK-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[P_ADDR]], align 8
 // CHECK-NEXT:    [[TMP4:%.*]] = load <8 x i32>, ptr [[T_ADDR]], align 32
-// CHECK-NEXT:    [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP3]], i32 32, <8 x i1> [[TMP2]], <8 x i32> [[TMP4]])
+// CHECK-NEXT:    [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP3]], i32 1, <8 x i1> [[TMP2]], <8 x i32> [[TMP4]])
 // CHECK-NEXT:    ret <8 x i32> [[MASKED_LOAD]]
 //
 v8i test_load_passthru(v8b m, v8i *p, v8i t) {
@@ -97,7 +97,7 @@ v8i test_load_expand(v8b m, v8i *p, v8i t) {
 // CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
 // CHECK-NEXT:    [[TMP3:%.*]] = load <8 x i32>, ptr [[V_ADDR]], align 32
 // CHECK-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[P_ADDR]], align 8
-// CHECK-NEXT:    call void @llvm.masked.store.v8i32.p0(<8 x i32> [[TMP3]], ptr [[TMP4]], i32 32, <8 x i1> [[TMP2]])
+// CHECK-NEXT:    call void @llvm.masked.store.v8i32.p0(<8 x i32> [[TMP3]], ptr [[TMP4]], i32 1, <8 x i1> [[TMP2]])
 // CHECK-NEXT:    ret void
 //
 void test_store(v8b m, v8i v, v8i *p) {

efriedma-quic · 2025-09-03T00:13:27Z

In theory we can lower masked load/store on any target; it's just a question of how terrible the resulting code is. And on a target that doesn't support unaligned load/store, the answer is, pretty terrible. But basically everything that supports vectors has unaligned access support these days.

I'm a little concerned the argument types aren't what we want; it isn't great that __builtin_masked_load(mask,ptr) is well-defined, but *ptr is not.

We could use EmitPointerWithAlignment() here, and infer whatever alignment that implies. Which allows users to access an unaligned load if they really want. But unaligned vector types are weird: we allow using an "aligned" attribute on typedefs, but we don't represent it as a qualifier internally, so bad things happen with type canonicalization.

Maybe we could consider changing the interface so it takes a pointer to the element type of the returned vector, instead of a pointer to the returned vector type.

jhuber6 · 2025-09-03T02:24:07Z

In theory we can lower masked load/store on any target; it's just a question of how terrible the resulting code is. And on a target that doesn't support unaligned load/store, the answer is, pretty terrible. But basically everything that supports vectors has unaligned access support these days.

I'm a little concerned the argument types aren't what we want; it isn't great that __builtin_masked_load(mask,ptr) is well-defined, but *ptr is not.

We could use EmitPointerWithAlignment() here, and infer whatever alignment that implies. Which allows users to access an unaligned load if they really want. But unaligned vector types are weird: we allow using an "aligned" attribute on typedefs, but we don't represent it as a qualifier internally, so bad things happen with type canonicalization.

Maybe we could consider changing the interface so it takes a pointer to the element type of the returned vector, instead of a pointer to the returned vector type.

I don't know why these intrinsics take alignment in the first place, the compress / expand variants don't. I think it'd be reasonable to change the intrinsics to derive alignment from the pointer and not an extra argument.

jhuber6 · 2025-09-10T16:43:40Z

ping

efriedma-quic · 2025-09-10T21:20:49Z

I don't know why these intrinsics take alignment in the first place, the compress / expand variants don't.

Originally, almost every LLVM intrinsic that accessed a pointer an explicit alignment parameter, including fundamental intrinsics like llvm.memcpy. We eventually realized that was redundant with alignment attributes on the arguments, and switched most intrinsics over. I think just nobody bothered to move masked load/store.

Environments that enforce strict alignment are still important in some contexts; even targets that commonly support misaligned operations, like AArch64 CPUs, have strict alignment enforcement in some modes. So it's important that we have a consistent story for what types and operations are actually aligned.

On the clang side, we want to have consistent rules for what operations are allowed, as much as possible. So I'm not really happy to have different rules for __builtin_masked_load vs. __builtin_memcpy.

jhuber6 · 2025-09-10T21:24:24Z

On the clang side, we want to have consistent rules for what operations are allowed, as much as possible. So I'm not really happy to have different rules for __builtin_masked_load vs. __builtin_memcpy.

Could you be a bit more specific here? The current behavior is that the builtin assumes the memory is aligned to the type's natural alignment. This patch changes that to assume it's unaligned, which is much more permissive.

efriedma-quic · 2025-09-10T21:49:54Z

Deferencing a pointer, __builtin_memcpy, and most other operations that involve accessing pointers, currently assume pointers are naturally aligned. The exact representation of this in LLVM IR varies, but the fundamental assumption is that you're following normal C/C++ standard rules for pointer alignment.

My concern here is mostly consistency: I don't want to have this small cluster of vector intrinsics to have different rules from everything else.

The reason I mentioned changing the interface before is related to this: if the masked load/store intrinsics take a pointer to the element type, instead of the vector type, that naturally reduces the required alignment. The element pointer will have smaller alignment.

jhuber6 · 2025-09-10T21:58:24Z

Deferencing a pointer, __builtin_memcpy, and most other operations that involve accessing pointers, currently assume pointers are naturally aligned. The exact representation of this in LLVM IR varies, but the fundamental assumption is that you're following normal C/C++ standard rules for pointer alignment.

My concern here is mostly consistency: I don't want to have this small cluster of vector intrinsics to have different rules from everything else.

The reason I mentioned changing the interface before is related to this: if the masked load/store intrinsics take a pointer to the element type, instead of the vector type, that naturally reduces the required alignment. The element pointer will have smaller alignment.

I see, it's certainly possible to just make the pointer to a scalar type and then use the mask's width to infer the overall size. I ended up doing that in #157895 and could apply it here if needed. That would lower the alignment to just the element type I suppose. I'm mostly worried about making this builtin too restrictive by increasing it beyond the 'standard' alignment you get on most platforms (16 AFAIK). Would it be reasonable to do that here? We haven't made a release yet so all this stuff is still able to be changed fortunately.

philnik777 · 2025-09-11T08:03:12Z

IMO the right default alignment would be alignof(*ptr). Objects are pretty much never underaligned, but they tend to not be overaliend unless you want something specific - at which point you can use __builtin_assume_aligned to inform the compiler that it can use higher alignment assumptions.

jhuber6 · 2025-09-22T13:49:16Z

Changed this to use the type's alignment. I can change the interface to use a V * instead of vec<V> * if preferred.

efriedma-quic · 2025-09-23T21:14:16Z

I'd prefer to change the interface too, I think.

jhuber6 · 2025-09-23T21:15:17Z

I'd prefer to change the interface too, I think.

So, take a type pointer instead of a vector pointer then.

efriedma-quic · 2025-09-23T21:24:47Z

Yes

github-actions · 2025-09-23T21:47:58Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary: Right now these enformce alignment, which isn't convenient for the user on platforms that support unaligned accesses. The options are to either permit passing the alignment manually, or just assume it's unaligned unless the user specifies it. I've added llvm#156057 which should make the requiested alignment show up on the intrinsic if the user passed `__builtin_assume_aligned`, however that's only with optimizations. This shouldn't cause issues unless the backend categorically decides to reject an unaligned access.

efriedma-quic

LGTM

…ointer (llvm#156063) Summary: Right now these enformce alignment, which isn't convenient for the user on platforms that support unaligned accesses. The options are to either permit passing the alignment manually, or just assume it's unaligned unless the user specifies it. I've added llvm#156057 which should make the requiested alignment show up on the intrinsic if the user passed `__builtin_assume_aligned`, however that's only with optimizations. This shouldn't cause issues unless the backend categorically decides to reject an unaligned access.

jhuber6 requested review from efriedma-quic, erichkeane and tbaederr August 29, 2025 17:15

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Aug 29, 2025

jhuber6 requested a review from AaronBallman September 10, 2025 21:55

jhuber6 changed the title ~~[Clang] Assume unaligned in maksed load / store builtins~~ [Clang] Assume unaligned in masked load / store builtins Sep 12, 2025

jhuber6 force-pushed the unaligned branch from c00fb9c to 3835333 Compare September 12, 2025 04:08

jhuber6 changed the title ~~[Clang] Assume unaligned in masked load / store builtins~~ [Clang] Assume type align in masked load / store builtins Sep 22, 2025

jhuber6 force-pushed the unaligned branch from 3835333 to fd4c88d Compare September 23, 2025 21:45

llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Sep 23, 2025

jhuber6 changed the title ~~[Clang] Assume type align in masked load / store builtins~~ [Clang] Change masked load / store builtin interface to take scalar pointer Sep 23, 2025

jhuber6 force-pushed the unaligned branch from fd4c88d to eae6411 Compare September 23, 2025 21:48

efriedma-quic approved these changes Sep 24, 2025

View reviewed changes

jhuber6 merged commit c989283 into llvm:main Sep 24, 2025
10 checks passed

[Clang] Change masked load / store builtin interface to take scalar pointer #156063

[Clang] Change masked load / store builtin interface to take scalar pointer #156063

Uh oh!

Conversation

jhuber6 commented Aug 29, 2025

Uh oh!

llvmbot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efriedma-quic commented Sep 3, 2025

Uh oh!

jhuber6 commented Sep 3, 2025

Uh oh!

jhuber6 commented Sep 10, 2025

Uh oh!

efriedma-quic commented Sep 10, 2025

Uh oh!

jhuber6 commented Sep 10, 2025

Uh oh!

efriedma-quic commented Sep 10, 2025

Uh oh!

jhuber6 commented Sep 10, 2025

Uh oh!

philnik777 commented Sep 11, 2025

Uh oh!

jhuber6 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efriedma-quic commented Sep 23, 2025

Uh oh!

jhuber6 commented Sep 23, 2025

Uh oh!

efriedma-quic commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Aug 29, 2025 •

edited

Loading

jhuber6 commented Sep 22, 2025 •

edited

Loading

github-actions bot commented Sep 23, 2025 •

edited

Loading