[InstCombine] Strip leading zero indices from GEP #155415

nikic · 2025-08-26T13:37:41Z

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.

GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`. Canonicalize these to `gep %T, ptr %p, i64 %idx`. This enables transforms that only support one GEP index to work and improves CSE.

llvmbot · 2025-08-26T13:38:14Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.

Patch is 213.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155415.diff

56 Files Affected:

(modified) clang/test/CodeGen/attr-counted-by.c (+94-106)
(modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+13)
(modified) llvm/test/CodeGen/AMDGPU/vector-alloca-addrspacecast.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/2006-12-15-Range-Test.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+10-10)
(modified) llvm/test/Transforms/InstCombine/canonicalize-gep-constglob.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/cast.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/gep-addrspace.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/gep-alias.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/gep-vector.ll (+16-16)
(modified) llvm/test/Transforms/InstCombine/getelementptr.ll (+5-5)
(modified) llvm/test/Transforms/InstCombine/load-bitcast-select.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/loadstore-alignment.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/mem-gep-zidx.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memchr-11.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcmp-8.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+11-11)
(modified) llvm/test/Transforms/InstCombine/opaque-ptr.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/pr58901.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/ptrtoint-nullgep.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/remove-loop-phi-multiply-by-zero.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/strcmp-3.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/strlen-7.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/strlen-8.ll (+5-5)
(modified) llvm/test/Transforms/InstCombine/strlen-9.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/vec_demanded_elts.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+22-22)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+24-24)
(modified) llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll (+26-26)
(modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+6-6)
(modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions2.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions3.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/X86/simplifycfg-late.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/scev-custom-dl.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll (+2-2)
(modified) llvm/test/Transforms/SimplifyCFG/Hexagon/switch-to-lookup-table.ll (+1-1)

diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 59e1b134850a9..cb23efdb8f263 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -72,7 +72,7 @@ struct anon_struct {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -81,7 +81,7 @@ struct anon_struct {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -90,7 +90,7 @@ struct anon_struct {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -99,7 +99,7 @@ struct anon_struct {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -120,7 +120,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP2]], 2
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
@@ -134,7 +134,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP0]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -142,7 +142,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -150,7 +150,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -243,7 +243,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -251,7 +251,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -259,7 +259,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -267,7 +267,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -350,7 +350,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 244
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP4:%.*]] = and i32 [[RESULT]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV2:%.*]] = select i1 [[TMP3]], i32 [[TMP4]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV2]], ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTNOT81:%.*]] = icmp eq i32 [[DOTCOUNTED_BY_LOAD]], 3
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[DOTNOT81]], label [[HANDLER_OUT_OF_BOUNDS18:%.*]], label [[CONT19:%.*]], !prof [[PROF8:![0-9]+]], !nosanitize [[META2]]
@@ -370,7 +370,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT25:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 240
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP7:%.*]] = and i32 [[RESULT25]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV27:%.*]] = select i1 [[TMP6]], i32 [[TMP7]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM31]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM31]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV27]], ptr [[ARRAYIDX36]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM42:%.*]] = sext i32 [[FAM_IDX]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_LOAD44:%.*]] = load i32, ptr [[DOTCOUNTED_BY_GEP]], align 4
@@ -389,7 +389,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB10:[0-9]+]], i64 [[IDXPROM60]]) #[[ATTR8]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont67:
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM60]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM60]]
 // SANITIZE-WITH-ATTR-NEXT:    [[COUNT50:%.*]] = sext i32 [[DOTCOUNTED_BY_LOAD44]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP10:%.*]] = sub nsw i64 [[COUNT50]], [[IDXPROM42]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP11:%.*]] = tail call i64 @llvm.smax.i64(i64 [[TMP10]], i64 0)
@@ -411,7 +411,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = and i32 [[RESULT]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV1:%.*]] = select i1 [[TMP0]], i32 [[TMP1]], i32 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV1]], ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD7:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE9:%.*]] = shl i32 [[COUNTED_BY_LOAD7]], 2
@@ -419,9 +419,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = icmp sgt i32 [[COUNTED_BY_LOAD7]], 3
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP3:%.*]] = and i32 [[RESULT10]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV12:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM14:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM14]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV12]], ptr [[ARRAYIDX15]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[FAM_IDX]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD20:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
@@ -434,9 +432,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP8:%.*]] = shl i32 [[DOTTR]], 2
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP9:%.*]] = and i32 [[TMP8]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV26:%.*]] = select i1 [[TMP7]], i32 [[TMP9]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD28:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM29:%.*]] = sext i32 [[ADD28]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM29]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV26]], ptr [[ARRAYIDX30]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -445,15 +441,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[ADD]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM17]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 4
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX18]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD31:%.*]] = add nsw i32 [[INDEX]], 2
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM32:%.*]] = sext i32 [[ADD31]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM32]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 8
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX33]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -462,15 +454,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM9:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM9]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD17:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM18:%.*]] = sext i32 [[ADD17]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM18]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX19]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -632,7 +620,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -641,7 +629,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -650,7 +638,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -659,7 +647,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -704,7 +692,7 @@ size_t test5_bdos(struct anon_struct *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE:%.*]] = shl nuw i64 [[COUNTED_BY_LOAD]], 2
 // SANITIZE-WITH-ATTR-...
[truncated]

davemgreen · 2025-08-26T13:59:51Z

Oh - I was about to put up fd286bd. That was only going to try with array types, not with vectors too, and I believe the only other semi-useful part was some DA tests I had tried adding without array indices (which looked fine for all the ones I tested).

The differences I have seen have mostly been extra hoisting causing extra unrolling. I would be interested in if @dtcxzyw finds anything more.

nikic · 2025-08-26T19:55:46Z

@zyw-bot csmith-fuzz

nikic · 2025-08-26T20:29:12Z

Based on llvm-opt-benchmark, the main impact this has is enabling the gep + add -> gep + gep in many more cases (there are a few other transforms that only support single index gep, but they're less significant). This mostly has positive effects (as many gep + adds become a constant offset gep where the base was CSEd). The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

nikic · 2025-08-27T13:17:57Z

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

nikic · 2025-08-28T13:15:25Z

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

I also checked whether not going this transform for constant negative offsets might make sense, but it turns out that it's pretty widely beneficial even in that case.

So I don't think there's anything we can really do about those regressions.

dtcxzyw

LG

jplehr · 2025-09-02T09:54:05Z

It appears that this broke our libc on GPU buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/12560
I bisected the breakage locally down to this change and reverting fixes it.
@jhuber6 for awareness.

A failing configuration can be reproduced with cmake -GNinja -C offload/cmake/caches/AMDGPULibcBot.cmake llvm. Though the fail is a runtime fail afaict.

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

nikic · 2025-09-02T10:33:48Z

@jplehr I don't have an AMDGPU environment to debug this. Is it possible to reduce this further?

jplehr · 2025-09-02T10:36:04Z

Yeah, I'm sure it is. Is IR dump what's needed here, either original or before some specific pass?

nikic · 2025-09-02T10:39:38Z

Not sure whether it's going to be sufficient, but the original IR dump would be a good starting point at least.

Adjust issue-118306.rs test after LLVM change This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

Rollup merge of #146116 - zmodem:issue_118306_fix, r=nikic Adjust issue-118306.rs test after LLVM change This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

jplehr · 2025-09-02T19:53:22Z

I finally got some IR and maybe some other info.
With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue.
after-lcssa.txt

jhuber6 · 2025-09-02T20:01:21Z

I finally got some IR and maybe some other info. With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue. after-lcssa.txt

I don't see anything immediately problematic looking at that function, https://godbolt.org/z/34fq6Erz8. But I'm not sure overall, cc @arsenm.

arsenm · 2025-09-03T01:20:20Z

Just a guess but possibly related to @ritter-x2a's work with using GEP flags and fixing incorrect addressing mode matching

nikic · 2025-09-03T09:42:35Z

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

arsenm · 2025-09-03T10:17:15Z

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

This IR is not suitable for running through codegen again. We cannot deal with IR that has already had control flow intrinsics inserted. Reproducers should be IR at the start of codegen, not the end of the codegen IR pipeline

nikic · 2025-09-03T10:22:44Z

Oh, I see. I thought this was a GVN run in the middle-end, but it's one in the backend.

As a side note, I noticed this comment:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Lines 1380 to 1393 in 2f5500e

    
           // EarlyCSE is not always strong enough to clean up what LSR produces. For 
        
           // example, GVN can combine 
        
           // 
        
           //   %0 = add %a, %b 
        
           //   %1 = add %b, %a 
        
           // 
        
           // and 
        
           // 
        
           //   %0 = shl nsw %a, 2 
        
           //   %1 = shl %a, 2 
        
           // 
        
           // but EarlyCSE can do neither of them. 
        
           if (isPassEnabled(EnableScalarIRPasses)) 
        
             addEarlyCSEOrGVNPass();

This looks very outdated. EarlyCSE supports both commutative instructions and poison flag intersection. You may be able to save some compile-time by using EarlyCSE instead of GVN there...

jplehr · 2025-09-03T10:29:13Z

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

arsenm · 2025-09-03T10:39:54Z

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

Yes, need the IR at the start of codegen

jplehr · 2025-09-03T11:27:13Z

Attached is the IR from pre codegen. Apologies that this may again not be what is needed. I always find the clang-linker-wrapper thing a bit confusing.
So, I hope this helps and thanks for the time.
libc-gpu-precodegen.txt

jplehr · 2025-09-04T09:54:59Z

I reverted this patch locally as this restores the tests back to passing and then retrieved the IR from the same point in the pipeline -- attached below.
There is two differences, that I am not sure how relevant they are. I'll quote one here, another similar one is found much further towards the end of the file.

Good case, line 189

 %arrayidx.i.i.i.i = getelementptr inbounds [64 x %"struct.__llvm_libc_22_0_0_git::AtExitUnit"], ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 0, i64 %1

Bad case, line 189

  %arrayidx.i.i.i.i = getelementptr inbounds %"struct.__llvm_libc_22_0_0_git::AtExitUnit", ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 %1

I do not know whether this is any significant though. Mostly caught my attention as a difference and that the integer literal that went away (64) is the number of lanes on our GPU.
Maybe that info is a piece to this puzzle.

IR:
libc-gpu-precodegen-good.txt

nikic · 2025-09-04T11:13:32Z

I looked at the diff between those two, and all the changes look correct to me. The only real difference is related to __cxa_thread_finalize, which is probably just a difference in baselines (it was added yesterday).

nikic · 2025-09-04T12:44:01Z

There are more diffs in the llc output, but I find it hard to interpret those without more knowledge about AMDGPU.

jayfoad · 2025-09-04T13:37:58Z

I think the relevant part of the diff is probably:

--- libc-gpu-precodegen-good.txt        2025-09-04 14:13:56.751818563 +0100
+++ libc-gpu-precodegen.txt     2025-09-04 14:13:49.402694527 +0100
@@ -719 +720 @@
-  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef 240, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #28
+  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef poison, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #27

Note that in the good case the first three arguments to the call are "%2, 240, 240" while in the bad case they are "%2, poison, 240".

nikic · 2025-09-04T14:13:54Z

Ah yes, good point. That means the issue was already introduced at an earlier point in the middle-end pipeline, not during codegen.

jplehr · 2025-09-05T09:29:48Z

I checked out main at 69d0c3e yesterday.
Then reverted this commit and validated that this still has the tests pass. I produced "good" intermediate files with that version.
I then reverted the revert and validated that this has the tests fail. I produced "bad" intermediate files with that version.

I have attached all IR files (except precodegen), but for more insight at what exact points in the pipeline they are produced @jhuber6 needs to help me. These are produced with -Wl,-save-temps on the final step of the ninja --verbose to build that particular test; if that helps in figuring things out.
There are a few differences in the preopt IR files between good and bad. I cannot judge their significance though. Similar for the internalize files.
If there is something else that I can/should provide to help with this, please let me know. I will need guidance then.

01-preopt-bad.txt
01-preopt-good.txt
02-internalize-bad.txt
02-internalize-good.txt
03-opt-bad.txt
03-opt-good.txt

jayfoad · 2025-09-05T10:10:34Z

This diff looks suspicious to me in function _ZN22__llvm_libc_22_0_0_git8internal18CharacterConverter8pop_utf8Ev:

--- 01-preopt-good.txt  2025-09-05 10:39:57.226224451 +0100
+++ 01-preopt-bad.txt   2025-09-05 10:39:56.149207707 +0100
@@ -11112,10 +11113,10 @@
 if.then4:                                         ; preds = %if.end
-  %sub7 = add nsw i64 %conv, -1
-  %4 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %sub7
-  %5 = load i32, ptr %0, align 4, !tbaa !58
+  %4 = add nuw nsw i64 %conv, 4294967295
+  %5 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %4
+  %6 = load i32, ptr %0, align 4, !tbaa !58
   %sh_prom = trunc nuw nsw i64 %mul to i32
-  %shr = lshr i32 %5, %sh_prom
+  %shr = lshr i32 %6, %sh_prom
   %shr.tr = trunc i32 %shr to i8
-  %or.narrow = or i8 %4, %shr.tr
-  %6 = zext i8 %or.narrow to i32
+  %or.narrow = or i8 %5, %shr.tr
+  %7 = zext i8 %or.narrow to i32
   br label %if.end15

I am guessing that the add nsw i64 %conv, -1 vs add nuw nsw i64 %conv, 4294967295 are the result of transforming some getelementptr instructions, and it seems suspicious that the 64-bit offset is small and negative in the good case but large and positive in the bad case.

jplehr · 2025-09-05T11:34:37Z

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

nikic · 2025-09-08T10:29:07Z

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

Yes, you're currently dumping the pre-LTO IR, but we'd want the full optimized one produced by the compiler. (Should be possible to get it with -S -emit-llvm -Xclang -disable-llvm-optzns.)

Does anyone know which transform is capable of converting a load from a constant global into a dynamic extract from a vector constant? I tried and failed to reproduce that behavior on simpler test cases.

[InstCombine] Strip leading zero indices from GEP

87fd263

GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`. Canonicalize these to `gep %T, ptr %p, i64 %idx`. This enables transforms that only support one GEP index to work and improves CSE.

nikic requested review from davemgreen and dtcxzyw August 26, 2025 13:37

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Aug 26, 2025

nikic mentioned this pull request Aug 26, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Aug 26, 2025

pre-commit: PR155415 dtcxzyw/llvm-opt-benchmark#2716

Closed

zyw-bot mentioned this pull request Aug 26, 2025

Fuzz PR155415 dtcxzyw/llvm-fuzz-service#129

Closed

dtcxzyw approved these changes Aug 29, 2025

View reviewed changes

nikic merged commit 055bfc0 into llvm:main Sep 1, 2025
14 checks passed

nikic deleted the instcombine-gep-leading-zeros branch September 1, 2025 07:58

s-barannikov mentioned this pull request Sep 1, 2025

[TableGen][Decoder] Decode operands with zero width or all bits known #156358

Merged

zmodem added a commit to zmodem/rust that referenced this pull request Sep 2, 2025

Adjust issue-118306.rs test after LLVM change

4ecca58

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

zmodem mentioned this pull request Sep 2, 2025

Adjust issue-118306.rs test after LLVM change rust-lang/rust#146116

Merged

lukel97 mentioned this pull request Sep 3, 2025

[RISCV] Unprofitable vectorization in llvm-test-suite since a49030e493b76bc2b294d36709c7ed84aef557f4 #156569

Open

[InstCombine] Strip leading zero indices from GEP #155415

[InstCombine] Strip leading zero indices from GEP #155415

Conversation

nikic commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Aug 26, 2025

Uh oh!

nikic commented Aug 26, 2025

Uh oh!

nikic commented Aug 26, 2025

Uh oh!

nikic commented Aug 27, 2025

Uh oh!

nikic commented Aug 28, 2025

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

nikic commented Sep 2, 2025

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

nikic commented Sep 2, 2025

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

jhuber6 commented Sep 2, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

nikic commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

nikic commented Sep 3, 2025

Uh oh!

jplehr commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

jplehr commented Sep 3, 2025

Uh oh!

jplehr commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

jayfoad commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

jplehr commented Sep 5, 2025

Uh oh!

jayfoad commented Sep 5, 2025

Uh oh!

jplehr commented Sep 5, 2025

Uh oh!

nikic commented Sep 8, 2025

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025 •

edited

Loading