Skip to content

[CodeGen] Mark mem intrinsic loads and stores as dereferenceable #80184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Jan 31, 2024

When directly generating loads/stores for small constant memset/memcpy/memset
intrinsics, mark the load and store operands as MODereferenceable (and use
DAG.getObjectPtrOffset to generate address arithmetic with 'nuw'), because these
intrinsics unconditionally dereference their operands anyway.
For WebAssembly, this allows the arithmetic to be folded directly into the load/store
constant offset field.

See #79692

@llvmbot
Copy link
Member

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-webassembly

Author: Derek Schuff (dschuff)

Changes

When directly generating loads/stores for small constant memset/memcpy intrinsics,
this change as written uses DAG.getObjectPtrOffset to generate address arithmetic
with 'nuw' when the src/dst pointers are known to be dereferenceable.
For WebAssembly, this allows the arithmetic to be folded directly into the load/store
constant offset field.

See #79692


Full diff: https://github.com/llvm/llvm-project/pull/80184.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+12-4)
  • (added) llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll (+30)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..a52bbdf92cf8d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7574,14 +7574,18 @@ static SDValue getMemcpyLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
 
       Value = DAG.getExtLoad(
           ISD::EXTLOAD, dl, NVT, Chain,
-          DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
+          isDereferenceable ? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) :
+            DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
           SrcPtrInfo.getWithOffset(SrcOff), VT,
           commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, NewAAInfo);
       OutLoadChains.push_back(Value.getValue(1));
 
+      isDereferenceable =
+        DstPtrInfo.getWithOffset(DstOff).isDereferenceable(VTSize, C, DL);
       Store = DAG.getTruncStore(
           Chain, dl, Value,
-          DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+          isDereferenceable ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff)) :
+            DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
           DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, NewAAInfo);
       OutStoreChains.push_back(Store);
     }
@@ -7715,7 +7719,7 @@ static SDValue getMemmoveLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
     MachineMemOperand::Flags SrcMMOFlags = MMOFlags;
     if (isDereferenceable)
       SrcMMOFlags |= MachineMemOperand::MODereferenceable;
-
+// TODO: Fix memmove too.
     Value = DAG.getLoad(
         VT, dl, Chain,
         DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
@@ -7863,9 +7867,13 @@ static SDValue getMemsetStores(SelectionDAG &DAG, const SDLoc &dl,
         Value = getMemsetValue(Src, VT, DAG, dl);
     }
     assert(Value.getValueType() == VT && "Value with wrong type.");
+    bool isDereferenceable = DstPtrInfo.isDereferenceable(
+        DstOff, *DAG.getContext(), DAG.getDataLayout());
     SDValue Store = DAG.getStore(
         Chain, dl, Value,
-        DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+        isDereferenceable
+            ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff))
+            : DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
         DstPtrInfo.getWithOffset(DstOff), Alignment,
         isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,
         NewAAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
new file mode 100644
index 0000000000000..15e68ab4122f9
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mcpu=mvp -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -tail-dup-placement=0 | FileCheck %
+
+target triple = "wasm32-unknown-unknown"
+
+define void @call_memset(ptr dereferenceable(16)) #0 {
+; CHECK-LABEL: call_memset:
+; CHECK:         .functype call_memset (i32) -> ()
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    i64.const $push0=, 0
+; CHECK-NEXT:    i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT:    i64.const $push1=, 0
+; CHECK-NEXT:    i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT:    return
+    call void @llvm.memset.p0.i32(ptr align 1 %0, i8 0, i32 16, i1 false)
+    ret void
+}
+
+define void @call_memcpy(ptr dereferenceable(16) %dst, ptr dereferenceable(16) %src) #0 {
+; CHECK-LABEL: call_memcpy:
+; CHECK:         .functype call_memcpy (i32, i32) -> ()
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    i64.load $push0=, 8($1):p2align=0
+; CHECK-NEXT:    i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT:    i64.load $push1=, 0($1):p2align=0
+; CHECK-NEXT:    i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT:    return
+    call void @llvm.memcpy.p0.p0.i32(ptr align 1 %dst, ptr align 1 %src, i32 16, i1 false)
+    ret void
+}

Copy link

github-actions bot commented Jan 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@dschuff
Copy link
Member Author

dschuff commented Jan 31, 2024

This change as written should be straightforward,
but as pointed out in the bug there is actually also a case to be made for using 'nuw' unconditionally (i.e. assuming that
the pointers are always dereferenceable up to the size of the memcpy). The langref doesn't explicitly say that it's UB if the pointers are not dereferenceable, but that's my interpretation of the langref and C standard.

Comment on lines 7577 to 7579
isDereferenceable
? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff))
: DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should move this to a parameter of getMemBasePlusOffset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's actually the only difference between the 2 functions (getObjectPtrOffset is just implemented in terms of getMemBasePlusOffset anway). So if you think it's a good idea I wouldn't mind just collapsing them as a separate refactoring (it would also fix the annoyance that the 2 functions have the same parameters but in a different order).

@dschuff
Copy link
Member Author

dschuff commented Feb 1, 2024

What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

@SingleAccretion
Copy link
Contributor

can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

A bit more evidence in favor of this - aggregate stores already use the optimal form (godbolt link).

@arsenm
Copy link
Contributor

arsenm commented Feb 5, 2024

What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

The memset is dereferencing them, so yes I think this is implied

@dschuff dschuff changed the title [CodeGen] Generate mem intrinsic address calculations with nuw [CodeGen] Mark mem intrinsic loads and stores as dereferenceable Feb 6, 2024
@dschuff
Copy link
Member Author

dschuff commented Feb 6, 2024

I am getting one local test failure here, in /test/CodeGen/BPF/undef.ll:
The test has a bunch of stores into an alloca, which i think are supposed to get converted to a single memset, so the test calls for

; EL: r1 = 11033905661445 ll
; CHECK: *(u64 *)(r10 - 8) = r1

(where 11033905661445 is 0xA0908070605, i.e. the stored values). With this change the output is

	r1 = 2569
	*(u16 *)(r10 - 4) = r1
	r1 = 134678021
	*(u32 *)(r10 - 8) = r1

i.e. the 0x0A09 has been split out from the 0x8070605. I have no idea yet why this change would do that.
Also, there is actually a memset on the next line, which seem to be zeroing the memory after the alloca'd pointer (which I think is UB?). Removing it doesn't seem to affect the output, but maybe something weird is going on.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally marking these loads/stores as dereferenceable does not seem justified to me, any more than it would for a regular load/store.

(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)

@arsenm
Copy link
Contributor

arsenm commented Feb 7, 2024

(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)

I thought the point was for code motion, which is kind of useless at the use point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants