Skip to content

[SelectionDAG] Use the nuw flag when expanding loads. #119288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

sunfishcode
Copy link
Member

When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this, getObjectPtrOffset.

This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present.

This was noticed in the development of #119204.

When expanding a load into two loads, use nuw for the add that computes
the offset from the base of the second load, because the original load
doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this,
`getObjectPtrOffset`.

This is in target-independent code, however in practice it only seems
to affact WebAssembly code, because WebAssembly load and store
instructions' constant offsets don't perform wrapping, so constant
folding often depends on the nuw flag being present.

This was noticed in the development of llvm#119204.
@sunfishcode sunfishcode requested a review from dschuff December 9, 2024 23:32
@llvmbot llvmbot added backend:WebAssembly llvm:SelectionDAG SelectionDAGISel as well labels Dec 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-webassembly

Author: Dan Gohman (sunfishcode)

Changes

When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this, getObjectPtrOffset.

This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present.

This was noticed in the development of #119204.


Patch is 101.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119288.diff

9 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp (+1-1)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat.ll (+15-45)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll (+36-144)
  • (modified) llvm/test/CodeGen/WebAssembly/i128.ll (+310-350)
  • (modified) llvm/test/CodeGen/WebAssembly/libcalls-trig.ll (+95-133)
  • (modified) llvm/test/CodeGen/WebAssembly/libcalls.ll (+135-175)
  • (modified) llvm/test/CodeGen/WebAssembly/multi-return.ll (+190-100)
  • (modified) llvm/test/CodeGen/WebAssembly/multivalue_libcall.ll (+3-9)
  • (modified) llvm/test/CodeGen/WebAssembly/umulo-128-legalisation-lowering.ll (+61-63)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
index 2655e8428309da..113a3bc0bbea69 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
@@ -265,7 +265,7 @@ void DAGTypeLegalizer::ExpandRes_NormalLoad(SDNode *N, SDValue &Lo,
 
   // Increment the pointer to the other half.
   unsigned IncrementSize = NVT.getSizeInBits() / 8;
-  Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::getFixed(IncrementSize), dl);
+  Ptr = DAG.getObjectPtrOffset(dl, Ptr, TypeSize::getFixed(IncrementSize));
   Hi = DAG.getLoad(
       NVT, dl, Chain, Ptr, LD->getPointerInfo().getWithOffset(IncrementSize),
       LD->getOriginalAlign(), LD->getMemOperand()->getFlags(), AAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll b/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
index 58e3f0dc2a93c0..137994ceac1322 100644
--- a/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
+++ b/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
@@ -524,9 +524,7 @@ define i64 @utest_f64i64(double %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixunsdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -563,9 +561,7 @@ define i64 @utest_f64i64_cse_combine(double %x) #0 {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixunsdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -602,9 +598,7 @@ define i64 @ustest_f64i64(double %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -661,9 +655,7 @@ define i64 @ustest_f64i64_cse_combine(double %x) #0 {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -727,9 +719,7 @@ define i64 @utest_f32i64(float %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -781,9 +771,7 @@ define i64 @ustest_f32i64(float %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -840,9 +828,7 @@ define i64 @ustest_f32i64_cse_combine(float %x) #0 {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -910,9 +896,7 @@ define i64 @utesth_f16i64(half %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -981,9 +965,7 @@ define i64 @ustest_f16i64(half %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1471,9 +1453,7 @@ define i64 @utest_f64i64_mm(double %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixunsdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1509,9 +1489,7 @@ define i64 @ustest_f64i64_mm(double %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1573,9 +1551,7 @@ define i64 @utest_f32i64_mm(float %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1611,9 +1587,7 @@ define i64 @ustest_f32i64_mm(float %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1679,9 +1653,7 @@ define i64 @utesth_f16i64_mm(half %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1719,9 +1691,7 @@ define i64 @ustest_f16i64_mm(half %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
diff --git a/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll b/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
index 8f85575c1cf431..1feb5feb7a9ee8 100644
--- a/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
+++ b/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
@@ -685,19 +685,13 @@ define <2 x i64> @stest_f64i64(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -800,19 +794,13 @@ define <2 x i64> @utest_f64i64(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixunsdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -863,19 +851,13 @@ define <2 x i64> @ustest_f64i64(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -964,19 +946,13 @@ define <2 x i64> @stest_f32i64(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1079,19 +1055,13 @@ define <2 x i64> @utest_f32i64(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1142,19 +1112,13 @@ define <2 x i64> @ustest_f32i64(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -1245,19 +1209,13 @@ define <2 x i64> @stest_f16i64(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
@@ -1362,19 +1320,13 @@ define <2 x i64> @utesth_f16i64(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
@@ -1427,19 +1379,13 @@ define <2 x i64> @ustest_f16i64(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
@@ -2163,19 +2109,13 @@ define <2 x i64> @stest_f64i64_mm(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2276,19 +2216,13 @@ define <2 x i64> @utest_f64i64_mm(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixunsdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2338,19 +2272,13 @@ define <2 x i64> @ustest_f64i64_mm(<2 x double> %x) {
 ; CHECK-NEXT:    f64x2.extract_lane 0
 ; CHECK-NEXT:    call __fixdfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2421,19 +2349,13 @@ define <2 x i64> @stest_f32i64_mm(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2534,19 +2456,13 @@ define <2 x i64> @utest_f32i64_mm(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2596,19 +2512,13 @@ define <2 x i64> @ustest_f32i64_mm(<2 x float> %x) {
 ; CHECK-NEXT:    f32x4.extract_lane 0
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 2
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 1
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 1
 ; CHECK-NEXT:    i64.load 0
@@ -2681,19 +2591,13 @@ define <2 x i64> @stest_f16i64_mm(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
@@ -2796,19 +2700,13 @@ define <2 x i64> @utesth_f16i64_mm(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixunssfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
@@ -2860,19 +2758,13 @@ define <2 x i64> @ustest_f16i64_mm(<2 x half> %x) {
 ; CHECK-NEXT:    call __extendhfsf2
 ; CHECK-NEXT:    call __fixsfti
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 16
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 24
 ; CHECK-NEXT:    local.set 3
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 16
 ; CHECK-NEXT:    local.set 4
 ; CHECK-NEXT:    local.get 2
-; CHECK-NEXT:    i32.const 8
-; CHECK-NEXT:    i32.add
-; CHECK-NEXT:    i64.load 0
+; CHECK-NEXT:    i64.load 8
 ; CHECK-NEXT:    local.set 5
 ; CHECK-NEXT:    local.get 2
 ; CHECK-NEXT:    i64.load 0
diff --git a/llvm/test/CodeGen/WebAssembly/i128.ll b/llvm/test/CodeGen/WebAssembly/i128.ll
index eae7f5f834dc0f..d9bec9b8ae887d 100644
--- a/llvm/test/CodeGen/WebAssembly/i128.ll
+++ b/llvm/test/CodeGen/WebAssembly/i128.ll
@@ -63,31 +63,29 @@ define i128 @mul128(i128 %x, i128 %y) {
 ; CHECK:         .functype mul128 (i32, i64, i64, i64, i64) -> ()
 ; CHECK-NEXT:    .local i32
 ; CHECK-NEXT:  # %bb.0:
-; CHECK-NEXT:    global.get $push4=, __stack_pointer
-; CHECK-NEXT:    i32.const $push5=, 16
-; CHECK-NEXT:    i32.sub $push9=, $pop4, $pop5
-; CHECK-NEXT:    local.tee $push8=, 5, $pop9
-; CHECK-NEXT:    global.set __stack_pointer, $pop8
-; CHECK-NEXT:    local.get $push14=, 5
-; CHECK-NEXT:    local.get $push13=, 1
-; CHECK-NEXT:    local.get $push12=, 2
-; CHECK-NEXT:    local.get $push11=, 3
-; CHECK-NEXT:    local.get $push10=, 4
-; CHECK-NEXT:    call __multi3, $pop14, $pop13, $pop12, $pop11, $pop10
+; CHECK-NEXT:    global.get $push2=, __stack_pointer
+; CHECK-NEXT:    i32.const $push3=, 16
+; CHECK-NEXT:    i32.sub $push7=, $pop2, $pop3
+; CHECK-NEXT:    loca...
[truncated]

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, LGTM

@dschuff
Copy link
Member

dschuff commented Dec 10, 2024

It reminded me of #80184 which I let drop and didn't get back around to, maybe I should back that one up to the uncontroversial part and try again.

@sunfishcode
Copy link
Member Author

sunfishcode commented Dec 10, 2024

This change also resulted in an extra S_ADD_I32 getting folded on AMDGPU too.

@sunfishcode sunfishcode merged commit e665e78 into llvm:main Dec 10, 2024
9 checks passed
@sunfishcode sunfishcode deleted the sunfishcode/legalize-load-nuw branch December 10, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants