Skip to content

[InstCombine] Don't folder select to or if value argument is user of invalid addrspacecast inst #144686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wenju-he
Copy link
Contributor

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.

…invalid addrspacecast inst

In our downstream GPU target, following IR is valid before instcombine
although the second addrspacecast causes UB.
  define i1 @test(ptr addrspace(1) noundef %v) {
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
    %4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
    %5 = icmp eq ptr addrspace(3) %4, null
    ret i1 %5
  }
We have a custom optimization that replaces invalid addrspacecast with
poison, and IR is still valid since `select` stops poison propagation.

However, instcombine pass optimizes `select` to `or`:
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
    %4 = icmp eq ptr addrspace(3) %3, null
    %5 = or i1 %2, %4
    ret i1 %5
The transform is invalid for our target.
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Wenju He (wenju-he)

Changes

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.


Full diff: https://github.com/llvm/llvm-project/pull/144686.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+29-6)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll (+23)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..a2335640f917b 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -3194,8 +3194,23 @@ static Instruction *foldNestedSelects(SelectInst &OuterSelVal,
 /// Return true if V is poison or \p Expected given that ValAssumedPoison is
 /// already poison. For example, if ValAssumedPoison is `icmp samesign X, 10`
 /// and V is `icmp ne X, 5`, impliesPoisonOrCond returns true.
-static bool impliesPoisonOrCond(const Value *ValAssumedPoison, const Value *V,
-                                bool Expected) {
+static bool impliesPoisonOrCond(
+    const Value *ValAssumedPoison, const Value *V, bool Expected,
+    llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
+  // Handle the case that ValAssumedPoison is `icmp eq ptr addrspace(3) X, null`
+  // and X is `addrspacecast ptr addrspace(1) Y to ptr addrspace(3)`. Target can
+  // replace X with poison if the addrspacecast is invalid. However, `V` might
+  // not be poison.
+  if (auto *ICmp = dyn_cast<ICmpInst>(ValAssumedPoison)) {
+    auto CanCreatePoison = [&](Value *Op) {
+      auto *ASC = dyn_cast<AddrSpaceCastInst>(Op);
+      return ASC && !isValidAddrSpaceCast(ASC->getDestAddressSpace(),
+                                          ASC->getSrcAddressSpace());
+    };
+    if (llvm::any_of(ICmp->operands(), CanCreatePoison))
+      return false;
+  }
+
   if (impliesPoison(ValAssumedPoison, V))
     return true;
 
@@ -3241,17 +3256,23 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   auto *Zero = ConstantInt::getFalse(SelType);
   Value *A, *B, *C, *D;
 
+  auto IsValidAddrSpaceCast = [&](unsigned FromAS, unsigned ToAS) {
+    return isValidAddrSpaceCast(FromAS, ToAS);
+  };
+
   // Folding select to and/or i1 isn't poison safe in general. impliesPoison
   // checks whether folding it does not convert a well-defined value into
   // poison.
   if (match(TrueVal, m_One())) {
-    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false)) {
+    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, true, C --> A = or B, C
       return BinaryOperator::CreateOr(CondVal, FalseVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_One(), m_Value(B)))) &&
-        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false)) {
+        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // (A || B) || C --> A || (B | C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalOr(A, Builder.CreateOr(B, FalseVal)));
@@ -3287,13 +3308,15 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   }
 
   if (match(FalseVal, m_Zero())) {
-    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true)) {
+    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, C, false --> A = and B, C
       return BinaryOperator::CreateAnd(CondVal, TrueVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_Value(B), m_Zero()))) &&
-        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true)) {
+        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // (A && B) && C --> A && (B & C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalAnd(A, Builder.CreateAnd(B, TrueVal)));
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
new file mode 100644
index 0000000000000..4791d2c434884
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
@@ -0,0 +1,23 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine %s | FileCheck %s
+
+; Check that `select B, true, C` isn't optimized to `or B, C`.
+define i1 @not_fold_select(ptr addrspace(1) noundef %x) {
+; CHECK-LABEL: define i1 @not_fold_select(
+; CHECK-SAME: ptr addrspace(1) noundef [[X:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = tail call i1 @llvm.amdgcn.is.shared(ptr [[TMP0]])
+; CHECK-NEXT:    [[TMP2:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr addrspace(3)
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq ptr addrspace(3) [[TMP2]], null
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP1]], i1 true, i1 [[TMP3]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+  entry:
+  %0 = addrspacecast ptr addrspace(1) %x to ptr
+  %1 = tail call i1 @llvm.amdgcn.is.shared(ptr %0)
+  %2 = addrspacecast ptr %0 to ptr addrspace(3)
+  %3 = select i1 %1, ptr addrspace(3) null, ptr addrspace(3) %2
+  %4 = icmp eq ptr addrspace(3) %3, null
+  ret i1 %4
+}

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Wenju He (wenju-he)

Changes

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.


Full diff: https://github.com/llvm/llvm-project/pull/144686.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+29-6)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll (+23)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..a2335640f917b 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -3194,8 +3194,23 @@ static Instruction *foldNestedSelects(SelectInst &OuterSelVal,
 /// Return true if V is poison or \p Expected given that ValAssumedPoison is
 /// already poison. For example, if ValAssumedPoison is `icmp samesign X, 10`
 /// and V is `icmp ne X, 5`, impliesPoisonOrCond returns true.
-static bool impliesPoisonOrCond(const Value *ValAssumedPoison, const Value *V,
-                                bool Expected) {
+static bool impliesPoisonOrCond(
+    const Value *ValAssumedPoison, const Value *V, bool Expected,
+    llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
+  // Handle the case that ValAssumedPoison is `icmp eq ptr addrspace(3) X, null`
+  // and X is `addrspacecast ptr addrspace(1) Y to ptr addrspace(3)`. Target can
+  // replace X with poison if the addrspacecast is invalid. However, `V` might
+  // not be poison.
+  if (auto *ICmp = dyn_cast<ICmpInst>(ValAssumedPoison)) {
+    auto CanCreatePoison = [&](Value *Op) {
+      auto *ASC = dyn_cast<AddrSpaceCastInst>(Op);
+      return ASC && !isValidAddrSpaceCast(ASC->getDestAddressSpace(),
+                                          ASC->getSrcAddressSpace());
+    };
+    if (llvm::any_of(ICmp->operands(), CanCreatePoison))
+      return false;
+  }
+
   if (impliesPoison(ValAssumedPoison, V))
     return true;
 
@@ -3241,17 +3256,23 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   auto *Zero = ConstantInt::getFalse(SelType);
   Value *A, *B, *C, *D;
 
+  auto IsValidAddrSpaceCast = [&](unsigned FromAS, unsigned ToAS) {
+    return isValidAddrSpaceCast(FromAS, ToAS);
+  };
+
   // Folding select to and/or i1 isn't poison safe in general. impliesPoison
   // checks whether folding it does not convert a well-defined value into
   // poison.
   if (match(TrueVal, m_One())) {
-    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false)) {
+    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, true, C --> A = or B, C
       return BinaryOperator::CreateOr(CondVal, FalseVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_One(), m_Value(B)))) &&
-        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false)) {
+        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // (A || B) || C --> A || (B | C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalOr(A, Builder.CreateOr(B, FalseVal)));
@@ -3287,13 +3308,15 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   }
 
   if (match(FalseVal, m_Zero())) {
-    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true)) {
+    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, C, false --> A = and B, C
       return BinaryOperator::CreateAnd(CondVal, TrueVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_Value(B), m_Zero()))) &&
-        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true)) {
+        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // (A && B) && C --> A && (B & C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalAnd(A, Builder.CreateAnd(B, TrueVal)));
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
new file mode 100644
index 0000000000000..4791d2c434884
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
@@ -0,0 +1,23 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine %s | FileCheck %s
+
+; Check that `select B, true, C` isn't optimized to `or B, C`.
+define i1 @not_fold_select(ptr addrspace(1) noundef %x) {
+; CHECK-LABEL: define i1 @not_fold_select(
+; CHECK-SAME: ptr addrspace(1) noundef [[X:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = tail call i1 @llvm.amdgcn.is.shared(ptr [[TMP0]])
+; CHECK-NEXT:    [[TMP2:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr addrspace(3)
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq ptr addrspace(3) [[TMP2]], null
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP1]], i1 true, i1 [[TMP3]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+  entry:
+  %0 = addrspacecast ptr addrspace(1) %x to ptr
+  %1 = tail call i1 @llvm.amdgcn.is.shared(ptr %0)
+  %2 = addrspacecast ptr %0 to ptr addrspace(3)
+  %3 = select i1 %1, ptr addrspace(3) null, ptr addrspace(3) %2
+  %4 = icmp eq ptr addrspace(3) %3, null
+  ret i1 %4
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,
, not here. LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

@nikic nikic requested a review from arsenm June 18, 2025 12:29
@arsenm
Copy link
Contributor

arsenm commented Jun 18, 2025

addrspacecast can produce poison. That's how we're handling the cases of addrspacecasts that are unimplementable

; CHECK-NEXT: ret i1 [[TMP4]]
;
entry:
%0 = addrspacecast ptr addrspace(1) %x to ptr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bool Expected) {
static bool impliesPoisonOrCond(
const Value *ValAssumedPoison, const Value *V, bool Expected,
llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need llvm:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wenju-he
Copy link
Contributor Author

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,

This PR uses isValidAddrSpaceCast query from TTI which is available in InstCombine. But TTI isn't available in above canCreateUndefOrPoison function.
If we unconditionally return true for addrspacecast in canCreateUndefOrPoison, there is regression in following test probably because canCreateUndefOrPoison returning true for valid addrspacecast is preventing the optimization:

define amdgpu_kernel void @__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383() {
; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn
; CHECK-LABEL: define {{[^@]+}}@__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383
; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
; CHECK-NEXT: [[TMP2:%.*]] = addrspacecast ptr addrspace(5) [[TMP1]] to ptr
; CHECK-NEXT: store ptr [[TMP2]], ptr addrspace(5) [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.*]] = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr nofree noundef readonly align 8 captures(none) dereferenceable_or_null(8) [[TMP2]], i1 noundef false)
; CHECK-NEXT: ret void
;
%1 = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store ptr %2, ptr addrspace(5) %1, align 8
%3 = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr %2, i1 false)
ret void
}

LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

done in ffff2c3, please review

@wenju-he wenju-he requested review from arsenm and nikic June 19, 2025 03:03
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 19, 2025

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,

This PR uses isValidAddrSpaceCast query from TTI which is available in InstCombine. But TTI isn't available in above canCreateUndefOrPoison function. If we unconditionally return true for addrspacecast in canCreateUndefOrPoison, there is regression in following test probably because canCreateUndefOrPoison returning true for valid addrspacecast is preventing the optimization:

define amdgpu_kernel void @__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383() {
; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn
; CHECK-LABEL: define {{[^@]+}}@__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383
; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
; CHECK-NEXT: [[TMP2:%.*]] = addrspacecast ptr addrspace(5) [[TMP1]] to ptr
; CHECK-NEXT: store ptr [[TMP2]], ptr addrspace(5) [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.*]] = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr nofree noundef readonly align 8 captures(none) dereferenceable_or_null(8) [[TMP2]], i1 noundef false)
; CHECK-NEXT: ret void
;
%1 = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store ptr %2, ptr addrspace(5) %1, align 8
%3 = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr %2, i1 false)
ret void
}

LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

done in ffff2c3, please review

I don't think special handling of addrspacecast in logical and/or->bitwise and/or works. There are many places which call isGuaranteedNotToBeUndefOrPoison. For example, eliminating freeze from freeze ptr (addrspacecast ptr addrspace(5) noundef %x to ptr) is still invalid. That is, I can construct an input to bypass the workaround and cause the miscompilation.
It would be better to default to returning true for addrspacecast in canCreateUndefOrPoison, then add an optional callback for isValidAddrSpaceCast queries.

@@ -12621,6 +12621,9 @@ have no side effects, and must not capture the value of the pointer.
If the source is :ref:`poison <poisonvalues>`, the result is
:ref:`poison <poisonvalues>`.

If the source is not :ref:`poison <poisonvalues>`, and the result pointer is
non-dereferenceable, the result is :ref:`poison <poisonvalues>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, not every addrspacecast of a non-dereferenceable pointer should automatically result in poison (e.g. the pointer to the end of an object is non-dereferenceable, but should certainly not turn into poison).

This should instead say something along the lines of:

Which address space casts are supported depends on the target. Unsupported address space casts return a poison value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants