-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[SelectionDAG] Use the nuw flag when expanding loads. #119288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SelectionDAG] Use the nuw flag when expanding loads. #119288
Conversation
When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space. It turns out there's already a dedicated helper function for doing this, `getObjectPtrOffset`. This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present. This was noticed in the development of llvm#119204.
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-webassembly Author: Dan Gohman (sunfishcode) ChangesWhen expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space. It turns out there's already a dedicated helper function for doing this, This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present. This was noticed in the development of #119204. Patch is 101.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119288.diff 9 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
index 2655e8428309da..113a3bc0bbea69 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
@@ -265,7 +265,7 @@ void DAGTypeLegalizer::ExpandRes_NormalLoad(SDNode *N, SDValue &Lo,
// Increment the pointer to the other half.
unsigned IncrementSize = NVT.getSizeInBits() / 8;
- Ptr = DAG.getMemBasePlusOffset(Ptr, TypeSize::getFixed(IncrementSize), dl);
+ Ptr = DAG.getObjectPtrOffset(dl, Ptr, TypeSize::getFixed(IncrementSize));
Hi = DAG.getLoad(
NVT, dl, Chain, Ptr, LD->getPointerInfo().getWithOffset(IncrementSize),
LD->getOriginalAlign(), LD->getMemOperand()->getFlags(), AAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll b/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
index 58e3f0dc2a93c0..137994ceac1322 100644
--- a/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
+++ b/llvm/test/CodeGen/WebAssembly/fpclamptosat.ll
@@ -524,9 +524,7 @@ define i64 @utest_f64i64(double %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixunsdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -563,9 +561,7 @@ define i64 @utest_f64i64_cse_combine(double %x) #0 {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixunsdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -602,9 +598,7 @@ define i64 @ustest_f64i64(double %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -661,9 +655,7 @@ define i64 @ustest_f64i64_cse_combine(double %x) #0 {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -727,9 +719,7 @@ define i64 @utest_f32i64(float %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -781,9 +771,7 @@ define i64 @ustest_f32i64(float %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -840,9 +828,7 @@ define i64 @ustest_f32i64_cse_combine(float %x) #0 {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -910,9 +896,7 @@ define i64 @utesth_f16i64(half %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -981,9 +965,7 @@ define i64 @ustest_f16i64(half %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1471,9 +1453,7 @@ define i64 @utest_f64i64_mm(double %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixunsdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1509,9 +1489,7 @@ define i64 @ustest_f64i64_mm(double %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1573,9 +1551,7 @@ define i64 @utest_f32i64_mm(float %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1611,9 +1587,7 @@ define i64 @ustest_f32i64_mm(float %x) {
; CHECK-NEXT: local.get 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1679,9 +1653,7 @@ define i64 @utesth_f16i64_mm(half %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1719,9 +1691,7 @@ define i64 @ustest_f16i64_mm(half %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
diff --git a/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll b/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
index 8f85575c1cf431..1feb5feb7a9ee8 100644
--- a/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
+++ b/llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
@@ -685,19 +685,13 @@ define <2 x i64> @stest_f64i64(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -800,19 +794,13 @@ define <2 x i64> @utest_f64i64(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixunsdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -863,19 +851,13 @@ define <2 x i64> @ustest_f64i64(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -964,19 +946,13 @@ define <2 x i64> @stest_f32i64(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1079,19 +1055,13 @@ define <2 x i64> @utest_f32i64(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1142,19 +1112,13 @@ define <2 x i64> @ustest_f32i64(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -1245,19 +1209,13 @@ define <2 x i64> @stest_f16i64(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
@@ -1362,19 +1320,13 @@ define <2 x i64> @utesth_f16i64(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
@@ -1427,19 +1379,13 @@ define <2 x i64> @ustest_f16i64(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
@@ -2163,19 +2109,13 @@ define <2 x i64> @stest_f64i64_mm(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2276,19 +2216,13 @@ define <2 x i64> @utest_f64i64_mm(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixunsdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2338,19 +2272,13 @@ define <2 x i64> @ustest_f64i64_mm(<2 x double> %x) {
; CHECK-NEXT: f64x2.extract_lane 0
; CHECK-NEXT: call __fixdfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2421,19 +2349,13 @@ define <2 x i64> @stest_f32i64_mm(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2534,19 +2456,13 @@ define <2 x i64> @utest_f32i64_mm(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2596,19 +2512,13 @@ define <2 x i64> @ustest_f32i64_mm(<2 x float> %x) {
; CHECK-NEXT: f32x4.extract_lane 0
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 2
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 1
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 1
; CHECK-NEXT: i64.load 0
@@ -2681,19 +2591,13 @@ define <2 x i64> @stest_f16i64_mm(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
@@ -2796,19 +2700,13 @@ define <2 x i64> @utesth_f16i64_mm(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixunssfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
@@ -2860,19 +2758,13 @@ define <2 x i64> @ustest_f16i64_mm(<2 x half> %x) {
; CHECK-NEXT: call __extendhfsf2
; CHECK-NEXT: call __fixsfti
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 16
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 24
; CHECK-NEXT: local.set 3
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 16
; CHECK-NEXT: local.set 4
; CHECK-NEXT: local.get 2
-; CHECK-NEXT: i32.const 8
-; CHECK-NEXT: i32.add
-; CHECK-NEXT: i64.load 0
+; CHECK-NEXT: i64.load 8
; CHECK-NEXT: local.set 5
; CHECK-NEXT: local.get 2
; CHECK-NEXT: i64.load 0
diff --git a/llvm/test/CodeGen/WebAssembly/i128.ll b/llvm/test/CodeGen/WebAssembly/i128.ll
index eae7f5f834dc0f..d9bec9b8ae887d 100644
--- a/llvm/test/CodeGen/WebAssembly/i128.ll
+++ b/llvm/test/CodeGen/WebAssembly/i128.ll
@@ -63,31 +63,29 @@ define i128 @mul128(i128 %x, i128 %y) {
; CHECK: .functype mul128 (i32, i64, i64, i64, i64) -> ()
; CHECK-NEXT: .local i32
; CHECK-NEXT: # %bb.0:
-; CHECK-NEXT: global.get $push4=, __stack_pointer
-; CHECK-NEXT: i32.const $push5=, 16
-; CHECK-NEXT: i32.sub $push9=, $pop4, $pop5
-; CHECK-NEXT: local.tee $push8=, 5, $pop9
-; CHECK-NEXT: global.set __stack_pointer, $pop8
-; CHECK-NEXT: local.get $push14=, 5
-; CHECK-NEXT: local.get $push13=, 1
-; CHECK-NEXT: local.get $push12=, 2
-; CHECK-NEXT: local.get $push11=, 3
-; CHECK-NEXT: local.get $push10=, 4
-; CHECK-NEXT: call __multi3, $pop14, $pop13, $pop12, $pop11, $pop10
+; CHECK-NEXT: global.get $push2=, __stack_pointer
+; CHECK-NEXT: i32.const $push3=, 16
+; CHECK-NEXT: i32.sub $push7=, $pop2, $pop3
+; CHECK-NEXT: loca...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, LGTM
It reminded me of #80184 which I let drop and didn't get back around to, maybe I should back that one up to the uncontroversial part and try again. |
This change also resulted in an extra |
When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space.
It turns out there's already a dedicated helper function for doing this,
getObjectPtrOffset
.This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present.
This was noticed in the development of #119204.