-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[CodeGen] Mark mem intrinsic loads and stores as dereferenceable #80184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-backend-webassembly Author: Derek Schuff (dschuff) ChangesWhen directly generating loads/stores for small constant memset/memcpy intrinsics, See #79692 Full diff: https://github.com/llvm/llvm-project/pull/80184.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..a52bbdf92cf8d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7574,14 +7574,18 @@ static SDValue getMemcpyLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
Value = DAG.getExtLoad(
ISD::EXTLOAD, dl, NVT, Chain,
- DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
+ isDereferenceable ? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) :
+ DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
SrcPtrInfo.getWithOffset(SrcOff), VT,
commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, NewAAInfo);
OutLoadChains.push_back(Value.getValue(1));
+ isDereferenceable =
+ DstPtrInfo.getWithOffset(DstOff).isDereferenceable(VTSize, C, DL);
Store = DAG.getTruncStore(
Chain, dl, Value,
- DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+ isDereferenceable ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff)) :
+ DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, NewAAInfo);
OutStoreChains.push_back(Store);
}
@@ -7715,7 +7719,7 @@ static SDValue getMemmoveLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
MachineMemOperand::Flags SrcMMOFlags = MMOFlags;
if (isDereferenceable)
SrcMMOFlags |= MachineMemOperand::MODereferenceable;
-
+// TODO: Fix memmove too.
Value = DAG.getLoad(
VT, dl, Chain,
DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
@@ -7863,9 +7867,13 @@ static SDValue getMemsetStores(SelectionDAG &DAG, const SDLoc &dl,
Value = getMemsetValue(Src, VT, DAG, dl);
}
assert(Value.getValueType() == VT && "Value with wrong type.");
+ bool isDereferenceable = DstPtrInfo.isDereferenceable(
+ DstOff, *DAG.getContext(), DAG.getDataLayout());
SDValue Store = DAG.getStore(
Chain, dl, Value,
- DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+ isDereferenceable
+ ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff))
+ : DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), Alignment,
isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,
NewAAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
new file mode 100644
index 0000000000000..15e68ab4122f9
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mcpu=mvp -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -tail-dup-placement=0 | FileCheck %
+
+target triple = "wasm32-unknown-unknown"
+
+define void @call_memset(ptr dereferenceable(16)) #0 {
+; CHECK-LABEL: call_memset:
+; CHECK: .functype call_memset (i32) -> ()
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: i64.const $push0=, 0
+; CHECK-NEXT: i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT: i64.const $push1=, 0
+; CHECK-NEXT: i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT: return
+ call void @llvm.memset.p0.i32(ptr align 1 %0, i8 0, i32 16, i1 false)
+ ret void
+}
+
+define void @call_memcpy(ptr dereferenceable(16) %dst, ptr dereferenceable(16) %src) #0 {
+; CHECK-LABEL: call_memcpy:
+; CHECK: .functype call_memcpy (i32, i32) -> ()
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: i64.load $push0=, 8($1):p2align=0
+; CHECK-NEXT: i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT: i64.load $push1=, 0($1):p2align=0
+; CHECK-NEXT: i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT: return
+ call void @llvm.memcpy.p0.p0.i32(ptr align 1 %dst, ptr align 1 %src, i32 16, i1 false)
+ ret void
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This change as written should be straightforward, |
isDereferenceable | ||
? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) | ||
: DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should move this to a parameter of getMemBasePlusOffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's actually the only difference between the 2 functions (getObjectPtrOffset is just implemented in terms of getMemBasePlusOffset anway). So if you think it's a good idea I wouldn't mind just collapsing them as a separate refactoring (it would also fix the annoyance that the 2 functions have the same parameters but in a different order).
What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw? |
A bit more evidence in favor of this - aggregate stores already use the optimal form (godbolt link). |
The memset is dereferencing them, so yes I think this is implied |
I am getting one local test failure here, in
(where 11033905661445 is 0xA0908070605, i.e. the stored values). With this change the output is
i.e. the 0x0A09 has been split out from the 0x8070605. I have no idea yet why this change would do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unconditionally marking these loads/stores as dereferenceable does not seem justified to me, any more than it would for a regular load/store.
(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable
metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)
I thought the point was for code motion, which is kind of useless at the use point |
When directly generating loads/stores for small constant memset/memcpy/memset
intrinsics, mark the load and store operands as
MODereferenceable
(and useDAG.getObjectPtrOffset
to generate address arithmetic with 'nuw'), because theseintrinsics unconditionally dereference their operands anyway.
For WebAssembly, this allows the arithmetic to be folded directly into the load/store
constant offset field.
See #79692