Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coroutines][CoroSplit] Store allocas on the frame if there is no explicit lifetime.end #88806

Closed
wants to merge 2 commits into from

Conversation

alanzhao1
Copy link
Contributor

@alanzhao1 alanzhao1 commented Apr 15, 2024

Certain passes such as SimplifyCFG may remove existing lifetime.end intrinsics on presplit coroutines. CoroSplitPass then incorrectly assumes that these allocas may not persist across the coroutine suspension point. To fix this, computeShouldLiveOnFrame() now saves allocas on the coroutine frame object if their addresses are escaped and those allocas don't have a corresponding lifetime.end.

Fixes #86580

…licit lifetime.end

Certain passes such as SimplifyCFG may remove existing lifetime.end
intrinsics on presplit coroutines. CoroSplitPass then incorrectly
assumes that these allocas may not persist across the coroutine
suspension point. To fix this, computeShouldLiveOnFrame() now saves
allocas on the coroutine frame object if their addresses are escaped and
those allocas don't have a corresponding lifetime.end.

Fixes llvm#86580
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-coroutines

Author: Alan Zhao (alanzhao1)

Changes

Certain passes such as SimplifyCFG may remove existing lifetime.end
intrinsics on presplit coroutines. CoroSplitPass then incorrectly
assumes that these allocas may not persist across the coroutine
suspension point. To fix this, computeShouldLiveOnFrame() now saves
allocas on the coroutine frame object if their addresses are escaped and
those allocas don't have a corresponding lifetime.end.

Fixes #86580


Full diff: https://github.com/llvm/llvm-project/pull/88806.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Coroutines/CoroFrame.cpp (+7)
  • (added) llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll (+54)
diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index 08a4522e3fac6d..8bd7d0dc36d532 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -1550,6 +1550,8 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
   }
 
   void visitIntrinsicInst(IntrinsicInst &II) {
+    if (II.getIntrinsicID() == Intrinsic::lifetime_end)
+      HasExplicitLifetimeEnd = true;
     // When we found the lifetime markers refers to a
     // subrange of the original alloca, ignore the lifetime
     // markers to avoid misleading the analysis.
@@ -1596,6 +1598,7 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
   SmallPtrSet<IntrinsicInst *, 2> LifetimeStarts{};
   bool MayWriteBeforeCoroBegin{false};
   bool ShouldUseLifetimeStartInfo{true};
+  bool HasExplicitLifetimeEnd{false};
 
   mutable std::optional<bool> ShouldLiveOnFrame{};
 
@@ -1614,6 +1617,10 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
       // suspend point between lifetime markers. This should also cover the
       // case of a single lifetime.start intrinsic in a loop with suspend point.
       if (PI.isEscaped()) {
+        // If there is no explicit lifetime.end, then assume the address can
+        // cross suspension points.
+        if (!HasExplicitLifetimeEnd)
+          return true;
         for (auto *A : LifetimeStarts) {
           for (auto *B : LifetimeStarts) {
             if (Checker.hasPathOrLoopCrossingSuspendPoint(A->getParent(),
diff --git a/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll b/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll
new file mode 100644
index 00000000000000..c9fdd4fd3d2bab
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S | FileCheck %s
+
+declare ptr @malloc(i64)
+
+%i8.array = type { [100 x i8] }
+declare void @consume.i8.array(ptr)
+
+; testval does not contain an explicit lifetime end. We must assume that it may
+; live across suspension.
+define void @foo() presplitcoroutine {
+; CHECK-LABEL: define void @foo() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr @foo.resumers)
+; CHECK-NEXT:    [[ALLOC:%.*]] = call ptr @malloc(i64 16)
+; CHECK-NEXT:    [[VFRAME:%.*]] = call noalias nonnull ptr @llvm.coro.begin(token [[ID]], ptr [[ALLOC]])
+; CHECK-NEXT:    store ptr @foo.resume, ptr [[VFRAME]], align 8
+; CHECK-NEXT:    [[DESTROY_ADDR:%.*]] = getelementptr inbounds [[FOO_FRAME:%.*]], ptr [[VFRAME]], i32 0, i32 1
+; CHECK-NEXT:    store ptr @foo.destroy, ptr [[DESTROY_ADDR]], align 8
+; CHECK-NEXT:    [[INDEX_ADDR1:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 2
+; CHECK-NEXT:    call void @consume.i8.array(ptr [[INDEX_ADDR1]])
+; CHECK-NEXT:    [[INDEX_ADDR2:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 3
+; CHECK-NEXT:    store i1 false, ptr [[INDEX_ADDR2]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %testval = alloca %i8.array
+  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+  %alloc = call ptr @malloc(i64 16) #3
+  %vFrame = call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %alloc)
+
+  call void @llvm.lifetime.start.p0(i64 100, ptr %testval)
+  call void @consume.i8.array(ptr %testval)
+
+  %save = call token @llvm.coro.save(ptr null)
+  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
+  switch i8 %suspend, label %exit [
+  i8 0, label %await.ready
+  i8 1, label %exit
+  ]
+await.ready:
+  %StrayCoroSave = call token @llvm.coro.save(ptr null)
+  br label %exit
+exit:
+  ret void
+}
+
+declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr)
+declare ptr @llvm.coro.begin(token, ptr writeonly) #3
+declare token @llvm.coro.save(ptr) #3
+declare ptr @llvm.coro.frame() #5
+declare i8 @llvm.coro.suspend(token, i1) #3
+declare i1 @llvm.coro.end(ptr, i1, token) #3
+declare void @llvm.lifetime.start.p0(i64, ptr nocapture) #4

@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alan Zhao (alanzhao1)

Changes

Certain passes such as SimplifyCFG may remove existing lifetime.end
intrinsics on presplit coroutines. CoroSplitPass then incorrectly
assumes that these allocas may not persist across the coroutine
suspension point. To fix this, computeShouldLiveOnFrame() now saves
allocas on the coroutine frame object if their addresses are escaped and
those allocas don't have a corresponding lifetime.end.

Fixes #86580


Full diff: https://github.com/llvm/llvm-project/pull/88806.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Coroutines/CoroFrame.cpp (+7)
  • (added) llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll (+54)
diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index 08a4522e3fac6d..8bd7d0dc36d532 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -1550,6 +1550,8 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
   }
 
   void visitIntrinsicInst(IntrinsicInst &II) {
+    if (II.getIntrinsicID() == Intrinsic::lifetime_end)
+      HasExplicitLifetimeEnd = true;
     // When we found the lifetime markers refers to a
     // subrange of the original alloca, ignore the lifetime
     // markers to avoid misleading the analysis.
@@ -1596,6 +1598,7 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
   SmallPtrSet<IntrinsicInst *, 2> LifetimeStarts{};
   bool MayWriteBeforeCoroBegin{false};
   bool ShouldUseLifetimeStartInfo{true};
+  bool HasExplicitLifetimeEnd{false};
 
   mutable std::optional<bool> ShouldLiveOnFrame{};
 
@@ -1614,6 +1617,10 @@ struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
       // suspend point between lifetime markers. This should also cover the
       // case of a single lifetime.start intrinsic in a loop with suspend point.
       if (PI.isEscaped()) {
+        // If there is no explicit lifetime.end, then assume the address can
+        // cross suspension points.
+        if (!HasExplicitLifetimeEnd)
+          return true;
         for (auto *A : LifetimeStarts) {
           for (auto *B : LifetimeStarts) {
             if (Checker.hasPathOrLoopCrossingSuspendPoint(A->getParent(),
diff --git a/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll b/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll
new file mode 100644
index 00000000000000..c9fdd4fd3d2bab
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-alloca-no-lifetime-end.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S | FileCheck %s
+
+declare ptr @malloc(i64)
+
+%i8.array = type { [100 x i8] }
+declare void @consume.i8.array(ptr)
+
+; testval does not contain an explicit lifetime end. We must assume that it may
+; live across suspension.
+define void @foo() presplitcoroutine {
+; CHECK-LABEL: define void @foo() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr @foo.resumers)
+; CHECK-NEXT:    [[ALLOC:%.*]] = call ptr @malloc(i64 16)
+; CHECK-NEXT:    [[VFRAME:%.*]] = call noalias nonnull ptr @llvm.coro.begin(token [[ID]], ptr [[ALLOC]])
+; CHECK-NEXT:    store ptr @foo.resume, ptr [[VFRAME]], align 8
+; CHECK-NEXT:    [[DESTROY_ADDR:%.*]] = getelementptr inbounds [[FOO_FRAME:%.*]], ptr [[VFRAME]], i32 0, i32 1
+; CHECK-NEXT:    store ptr @foo.destroy, ptr [[DESTROY_ADDR]], align 8
+; CHECK-NEXT:    [[INDEX_ADDR1:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 2
+; CHECK-NEXT:    call void @consume.i8.array(ptr [[INDEX_ADDR1]])
+; CHECK-NEXT:    [[INDEX_ADDR2:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 3
+; CHECK-NEXT:    store i1 false, ptr [[INDEX_ADDR2]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %testval = alloca %i8.array
+  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+  %alloc = call ptr @malloc(i64 16) #3
+  %vFrame = call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %alloc)
+
+  call void @llvm.lifetime.start.p0(i64 100, ptr %testval)
+  call void @consume.i8.array(ptr %testval)
+
+  %save = call token @llvm.coro.save(ptr null)
+  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
+  switch i8 %suspend, label %exit [
+  i8 0, label %await.ready
+  i8 1, label %exit
+  ]
+await.ready:
+  %StrayCoroSave = call token @llvm.coro.save(ptr null)
+  br label %exit
+exit:
+  ret void
+}
+
+declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr)
+declare ptr @llvm.coro.begin(token, ptr writeonly) #3
+declare token @llvm.coro.save(ptr) #3
+declare ptr @llvm.coro.frame() #5
+declare i8 @llvm.coro.suspend(token, i1) #3
+declare i1 @llvm.coro.end(ptr, i1, token) #3
+declare void @llvm.lifetime.start.p0(i64, ptr nocapture) #4

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis, it is pretty helpful.

From the analysis, I feel there is deep problems in the current model since it shows the lifetime intrinsics is not so stable as we thought. e.g., if there are multiple llvm.lifetime.start and llvm.lifetime.end intrinsics but some passes only remove some of the llvm.lifetime.end intrinsics (not all), I fear it may still be problematic.

The root problem here is that, when we use lifetime info to decide if an alloca should live on the frame, we don't consider lifetime end at all. This is not correct. To fix this fundamentally, I think we need to introduce some data flow analysis here. The logic may be:

  1. If there is any path from any llvm.lifetime.start to the llvm.coro.end without touching llvm.lifetime.end, we need to put the variable on the frame.
  2. If there is any path from any llvm.lifetime.start to the llvm.lifetime.end touched llvm.coro.suspend, we need to put the variable on the frame.
  3. Then it may be safe to put the variable on the stack.

I am not sure if there is existing logic for this. If not, probably we need to do it ourselves. We already did some of such local data flow analysis in coroutines.

@efriedma-quic
Copy link
Collaborator

Also:

  • If there's a lifetime.start, and it flows to a ret/unreachable/resume, treat it as if there's a lifetime.end just before that.

llvm/lib/CodeGen/StackColoring.cpp implements this analysis, but not in a way you can easily reuse. Still probably a useful reference.

(Lifetime intrinstics are inherently pretty messy; see also #45725.)

@alanzhao1 alanzhao1 changed the title [coroutines][CoroSplit] Store allocas on the frame if there is no exp… …licit lifetime.end [coroutines][CoroSplit] Store allocas on the frame if there is no explicit lifetime.end Apr 16, 2024
@zmodem
Copy link
Collaborator

zmodem commented Apr 17, 2024

Is someone looking at doing the deeper fix to the lifetime analysis as suggested above, or should we do a stop gap fix in the meantime?

I'm not sure if the current patch is sufficient: it checks whether there is a lifetime.end intrinsic, but what if there is one on one execution path, but not another? Maybe we should always set ShouldUseLifetimeStartInfo to avoid this miscompile for now?

@ChuanqiXu9
Copy link
Member

Is someone looking at doing the deeper fix to the lifetime analysis as suggested above, or should we do a stop gap fix in the meantime?

I have less time developing into coroutines in detail recently. So if there is no one had the chance to fix this properly in the near future, I think we can accept this as a partial workaround.

I'm not sure if the current patch is sufficient: it checks whether there is a lifetime.end intrinsic, but what if there is one on one execution path, but not another?

Yeah, this is the reason why I think the current patch is a partial fix.

Maybe we should always set ShouldUseLifetimeStartInfo to avoid this miscompile for now?

I am not sure what you mean. ShouldUseLifetimeStartInfo is always set for C++.

@alanzhao1
Copy link
Contributor Author

Is someone looking at doing the deeper fix to the lifetime analysis as suggested above, or should we do a stop gap fix in the meantime?

I have less time developing into coroutines in detail recently. So if there is no one had the chance to fix this properly in the near future, I think we can accept this as a partial workaround.

I'm not sure if the current patch is sufficient: it checks whether there is a lifetime.end intrinsic, but what if there is one on one execution path, but not another?

Yeah, this is the reason why I think the current patch is a partial fix.

Agreed that this is a partial fix - it doesn't address the example that @zmodem mentioned, plus there could be other scenarios that we didn't think of. A complete fix would probably require an RFC on Discourse due to its complexity (llvm lifetimes intrinsics are notoriously fickle per #88806 (comment), and we'd likely have to rewrite most of the existing logic here.)

Maybe we should always set ShouldUseLifetimeStartInfo to avoid this miscompile for now?

I am not sure what you mean. ShouldUseLifetimeStartInfo is always set for C++.

I think the proposal is that we put all coroutine variables on the coroutine frame object rather than try to figure out what variables can go on the stack. This would generate suboptimal code, but it will be guaranteed to be correct.

@ChuanqiXu9
Copy link
Member

I think the proposal is that we put all coroutine variables on the coroutine frame object rather than try to figure out what variables can go on the stack. This would generate suboptimal code, but it will be guaranteed to be correct.

Oh, it means to set ShouldUseLifetimeStartInfo to false. This change may have a large impact. Coroutines nowadays are used in many performance critical sections... we also need a RFC and call users to test the performance changes to do this...

Maybe we should try to look at how to fix it properly...

@zmodem
Copy link
Collaborator

zmodem commented Apr 18, 2024

Oh, it means to set ShouldUseLifetimeStartInfo to false.

Yes, sorry, I forgot to write the "to false" part.

Coroutines nowadays are used in many performance critical sections

That makes it all the more important to ensure we don't miscompile the code ;)

Maybe we should try to look at how to fix it properly...

Agreed :)

@apolloww
Copy link
Contributor

This change may have a large impact. Coroutines nowadays are used in many performance critical sections... we also need a RFC and call users to test the performance changes to do this...

I think it would mostly just increase memory pressure, but not so much on runtime performance unless such case happens frequently enough to bloat up the heap size to certain point that allocation performance starts to dip.

@alanzhao1
Copy link
Contributor Author

@alanzhao1
Copy link
Contributor Author

Closing in favor of #90265

@alanzhao1 alanzhao1 closed this Apr 26, 2024
@alanzhao1 alanzhao1 deleted the bugfix/coroutine-alloc-no-end branch June 20, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[coroutines] LLVM incorrectly allocates variables on the coroutine stack
6 participants