Skip to content

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Aug 26, 2025

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.

GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.

This enables transforms that only support one GEP index to work
and improves CSE.
@nikic nikic requested review from davemgreen and dtcxzyw August 26, 2025 13:37
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Aug 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.


Patch is 213.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155415.diff

56 Files Affected:

  • (modified) clang/test/CodeGen/attr-counted-by.c (+94-106)
  • (modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+13)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-alloca-addrspacecast.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/2006-12-15-Range-Test.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+10-10)
  • (modified) llvm/test/Transforms/InstCombine/canonicalize-gep-constglob.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/cast.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/gep-addrspace.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/gep-alias.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/gep-vector.ll (+16-16)
  • (modified) llvm/test/Transforms/InstCombine/getelementptr.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/load-bitcast-select.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/loadstore-alignment.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/mem-gep-zidx.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memchr-11.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memcmp-8.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+6-6)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+11-11)
  • (modified) llvm/test/Transforms/InstCombine/opaque-ptr.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/pr58901.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/ptrtoint-nullgep.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/remove-loop-phi-multiply-by-zero.ll (+6-6)
  • (modified) llvm/test/Transforms/InstCombine/strcmp-3.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/strlen-7.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/strlen-8.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/strlen-9.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/vec_demanded_elts.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+22-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+24-24)
  • (modified) llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll (+26-26)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+6-6)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions2.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions3.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/simplifycfg-late.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/scev-custom-dl.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll (+2-2)
  • (modified) llvm/test/Transforms/SimplifyCFG/Hexagon/switch-to-lookup-table.ll (+1-1)
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 59e1b134850a9..cb23efdb8f263 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -72,7 +72,7 @@ struct anon_struct {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -81,7 +81,7 @@ struct anon_struct {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -90,7 +90,7 @@ struct anon_struct {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -99,7 +99,7 @@ struct anon_struct {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -120,7 +120,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP2]], 2
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
@@ -134,7 +134,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP0]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -142,7 +142,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -150,7 +150,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -243,7 +243,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -251,7 +251,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -259,7 +259,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -267,7 +267,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -350,7 +350,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 244
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP4:%.*]] = and i32 [[RESULT]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV2:%.*]] = select i1 [[TMP3]], i32 [[TMP4]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV2]], ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTNOT81:%.*]] = icmp eq i32 [[DOTCOUNTED_BY_LOAD]], 3
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[DOTNOT81]], label [[HANDLER_OUT_OF_BOUNDS18:%.*]], label [[CONT19:%.*]], !prof [[PROF8:![0-9]+]], !nosanitize [[META2]]
@@ -370,7 +370,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT25:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 240
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP7:%.*]] = and i32 [[RESULT25]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV27:%.*]] = select i1 [[TMP6]], i32 [[TMP7]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM31]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM31]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV27]], ptr [[ARRAYIDX36]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM42:%.*]] = sext i32 [[FAM_IDX]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_LOAD44:%.*]] = load i32, ptr [[DOTCOUNTED_BY_GEP]], align 4
@@ -389,7 +389,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB10:[0-9]+]], i64 [[IDXPROM60]]) #[[ATTR8]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont67:
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM60]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM60]]
 // SANITIZE-WITH-ATTR-NEXT:    [[COUNT50:%.*]] = sext i32 [[DOTCOUNTED_BY_LOAD44]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP10:%.*]] = sub nsw i64 [[COUNT50]], [[IDXPROM42]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP11:%.*]] = tail call i64 @llvm.smax.i64(i64 [[TMP10]], i64 0)
@@ -411,7 +411,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = and i32 [[RESULT]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV1:%.*]] = select i1 [[TMP0]], i32 [[TMP1]], i32 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV1]], ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD7:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE9:%.*]] = shl i32 [[COUNTED_BY_LOAD7]], 2
@@ -419,9 +419,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = icmp sgt i32 [[COUNTED_BY_LOAD7]], 3
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP3:%.*]] = and i32 [[RESULT10]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV12:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM14:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM14]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV12]], ptr [[ARRAYIDX15]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[FAM_IDX]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD20:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
@@ -434,9 +432,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP8:%.*]] = shl i32 [[DOTTR]], 2
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP9:%.*]] = and i32 [[TMP8]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV26:%.*]] = select i1 [[TMP7]], i32 [[TMP9]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD28:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM29:%.*]] = sext i32 [[ADD28]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM29]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV26]], ptr [[ARRAYIDX30]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -445,15 +441,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[ADD]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM17]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 4
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX18]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD31:%.*]] = add nsw i32 [[INDEX]], 2
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM32:%.*]] = sext i32 [[ADD31]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM32]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 8
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX33]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -462,15 +454,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM9:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM9]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD17:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM18:%.*]] = sext i32 [[ADD17]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM18]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX19]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -632,7 +620,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -641,7 +629,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -650,7 +638,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -659,7 +647,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -704,7 +692,7 @@ size_t test5_bdos(struct anon_struct *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE:%.*]] = shl nuw i64 [[COUNTED_BY_LOAD]], 2
 // SANITIZE-WITH-ATTR-...
[truncated]

@davemgreen
Copy link
Collaborator

Oh - I was about to put up fd286bd. That was only going to try with array types, not with vectors too, and I believe the only other semi-useful part was some DA tests I had tried adding without array indices (which looked fine for all the ones I tested).

The differences I have seen have mostly been extra hoisting causing extra unrolling. I would be interested in if @dtcxzyw finds anything more.

@nikic
Copy link
Contributor Author

nikic commented Aug 26, 2025

@zyw-bot csmith-fuzz

@nikic
Copy link
Contributor Author

nikic commented Aug 26, 2025

Based on llvm-opt-benchmark, the main impact this has is enabling the gep + add -> gep + gep in many more cases (there are a few other transforms that only support single index gep, but they're less significant). This mostly has positive effects (as many gep + adds become a constant offset gep where the base was CSEd). The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

@nikic
Copy link
Contributor Author

nikic commented Aug 27, 2025

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

@nikic
Copy link
Contributor Author

nikic commented Aug 28, 2025

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

I also checked whether not going this transform for constant negative offsets might make sense, but it turns out that it's pretty widely beneficial even in that case.

So I don't think there's anything we can really do about those regressions.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@nikic nikic merged commit 055bfc0 into llvm:main Sep 1, 2025
14 checks passed
@nikic nikic deleted the instcombine-gep-leading-zeros branch September 1, 2025 07:58
@jplehr
Copy link
Contributor

jplehr commented Sep 2, 2025

It appears that this broke our libc on GPU buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/12560
I bisected the breakage locally down to this change and reverting fixes it.
@jhuber6 for awareness.

A failing configuration can be reproduced with cmake -GNinja -C offload/cmake/caches/AMDGPULibcBot.cmake llvm. Though the fail is a runtime fail afaict.

zmodem added a commit to zmodem/rust that referenced this pull request Sep 2, 2025
This updates tests/codegen-llvm/issues/issue-118306.rs to pass also
after llvm/llvm-project#155415
@nikic
Copy link
Contributor Author

nikic commented Sep 2, 2025

@jplehr I don't have an AMDGPU environment to debug this. Is it possible to reduce this further?

@jplehr
Copy link
Contributor

jplehr commented Sep 2, 2025

Yeah, I'm sure it is. Is IR dump what's needed here, either original or before some specific pass?

@nikic
Copy link
Contributor Author

nikic commented Sep 2, 2025

Not sure whether it's going to be sufficient, but the original IR dump would be a good starting point at least.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Sep 2, 2025
Adjust issue-118306.rs test after LLVM change

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415
rust-timer added a commit to rust-lang/rust that referenced this pull request Sep 2, 2025
Rollup merge of #146116 - zmodem:issue_118306_fix, r=nikic

Adjust issue-118306.rs test after LLVM change

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415
@jplehr
Copy link
Contributor

jplehr commented Sep 2, 2025

I finally got some IR and maybe some other info.
With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue.
after-lcssa.txt

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 2, 2025

I finally got some IR and maybe some other info. With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue. after-lcssa.txt

I don't see anything immediately problematic looking at that function, https://godbolt.org/z/34fq6Erz8. But I'm not sure overall, cc @arsenm.

@arsenm
Copy link
Contributor

arsenm commented Sep 3, 2025

Just a guess but possibly related to @ritter-x2a's work with using GEP flags and fixing incorrect addressing mode matching

@nikic
Copy link
Contributor Author

nikic commented Sep 3, 2025

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

@arsenm
Copy link
Contributor

arsenm commented Sep 3, 2025

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

This IR is not suitable for running through codegen again. We cannot deal with IR that has already had control flow intrinsics inserted. Reproducers should be IR at the start of codegen, not the end of the codegen IR pipeline

@nikic
Copy link
Contributor Author

nikic commented Sep 3, 2025

Oh, I see. I thought this was a GVN run in the middle-end, but it's one in the backend.

As a side note, I noticed this comment:

// EarlyCSE is not always strong enough to clean up what LSR produces. For
// example, GVN can combine
//
// %0 = add %a, %b
// %1 = add %b, %a
//
// and
//
// %0 = shl nsw %a, 2
// %1 = shl %a, 2
//
// but EarlyCSE can do neither of them.
if (isPassEnabled(EnableScalarIRPasses))
addEarlyCSEOrGVNPass();

This looks very outdated. EarlyCSE supports both commutative instructions and poison flag intersection. You may be able to save some compile-time by using EarlyCSE instead of GVN there...

@jplehr
Copy link
Contributor

jplehr commented Sep 3, 2025

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

@arsenm
Copy link
Contributor

arsenm commented Sep 3, 2025

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

Yes, need the IR at the start of codegen

@jplehr
Copy link
Contributor

jplehr commented Sep 3, 2025

Attached is the IR from pre codegen. Apologies that this may again not be what is needed. I always find the clang-linker-wrapper thing a bit confusing.
So, I hope this helps and thanks for the time.
libc-gpu-precodegen.txt

@jplehr
Copy link
Contributor

jplehr commented Sep 4, 2025

I reverted this patch locally as this restores the tests back to passing and then retrieved the IR from the same point in the pipeline -- attached below.
There is two differences, that I am not sure how relevant they are. I'll quote one here, another similar one is found much further towards the end of the file.

Good case, line 189

 %arrayidx.i.i.i.i = getelementptr inbounds [64 x %"struct.__llvm_libc_22_0_0_git::AtExitUnit"], ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 0, i64 %1

Bad case, line 189

  %arrayidx.i.i.i.i = getelementptr inbounds %"struct.__llvm_libc_22_0_0_git::AtExitUnit", ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 %1

I do not know whether this is any significant though. Mostly caught my attention as a difference and that the integer literal that went away (64) is the number of lanes on our GPU.
Maybe that info is a piece to this puzzle.

IR:
libc-gpu-precodegen-good.txt

@nikic
Copy link
Contributor Author

nikic commented Sep 4, 2025

I looked at the diff between those two, and all the changes look correct to me. The only real difference is related to __cxa_thread_finalize, which is probably just a difference in baselines (it was added yesterday).

@nikic
Copy link
Contributor Author

nikic commented Sep 4, 2025

There are more diffs in the llc output, but I find it hard to interpret those without more knowledge about AMDGPU.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 4, 2025

I think the relevant part of the diff is probably:

--- libc-gpu-precodegen-good.txt        2025-09-04 14:13:56.751818563 +0100
+++ libc-gpu-precodegen.txt     2025-09-04 14:13:49.402694527 +0100
@@ -719 +720 @@
-  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef 240, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #28
+  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef poison, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #27

Note that in the good case the first three arguments to the call are "%2, 240, 240" while in the bad case they are "%2, poison, 240".

@nikic
Copy link
Contributor Author

nikic commented Sep 4, 2025

Ah yes, good point. That means the issue was already introduced at an earlier point in the middle-end pipeline, not during codegen.

@jplehr
Copy link
Contributor

jplehr commented Sep 5, 2025

I checked out main at 69d0c3e yesterday.
Then reverted this commit and validated that this still has the tests pass. I produced "good" intermediate files with that version.
I then reverted the revert and validated that this has the tests fail. I produced "bad" intermediate files with that version.

I have attached all IR files (except precodegen), but for more insight at what exact points in the pipeline they are produced @jhuber6 needs to help me. These are produced with -Wl,-save-temps on the final step of the ninja --verbose to build that particular test; if that helps in figuring things out.
There are a few differences in the preopt IR files between good and bad. I cannot judge their significance though. Similar for the internalize files.
If there is something else that I can/should provide to help with this, please let me know. I will need guidance then.

01-preopt-bad.txt
01-preopt-good.txt
02-internalize-bad.txt
02-internalize-good.txt
03-opt-bad.txt
03-opt-good.txt

@jayfoad
Copy link
Contributor

jayfoad commented Sep 5, 2025

This diff looks suspicious to me in function _ZN22__llvm_libc_22_0_0_git8internal18CharacterConverter8pop_utf8Ev:

--- 01-preopt-good.txt  2025-09-05 10:39:57.226224451 +0100
+++ 01-preopt-bad.txt   2025-09-05 10:39:56.149207707 +0100
@@ -11112,10 +11113,10 @@
 if.then4:                                         ; preds = %if.end
-  %sub7 = add nsw i64 %conv, -1
-  %4 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %sub7
-  %5 = load i32, ptr %0, align 4, !tbaa !58
+  %4 = add nuw nsw i64 %conv, 4294967295
+  %5 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %4
+  %6 = load i32, ptr %0, align 4, !tbaa !58
   %sh_prom = trunc nuw nsw i64 %mul to i32
-  %shr = lshr i32 %5, %sh_prom
+  %shr = lshr i32 %6, %sh_prom
   %shr.tr = trunc i32 %shr to i8
-  %or.narrow = or i8 %4, %shr.tr
-  %6 = zext i8 %or.narrow to i32
+  %or.narrow = or i8 %5, %shr.tr
+  %7 = zext i8 %or.narrow to i32
   br label %if.end15

I am guessing that the add nsw i64 %conv, -1 vs add nuw nsw i64 %conv, 4294967295 are the result of transforming some getelementptr instructions, and it seems suspicious that the 64-bit offset is small and negative in the good case but large and positive in the bad case.

@jplehr
Copy link
Contributor

jplehr commented Sep 5, 2025

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

@nikic
Copy link
Contributor Author

nikic commented Sep 8, 2025

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

Yes, you're currently dumping the pre-LTO IR, but we'd want the full optimized one produced by the compiler. (Should be possible to get it with -S -emit-llvm -Xclang -disable-llvm-optzns.)


Does anyone know which transform is capable of converting a load from a constant global into a dynamic extract from a vector constant? I tried and failed to reproduce that behavior on simpler test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants