Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP][OMPIRBuilder] Error propagation across callbacks #112533

Merged
merged 4 commits into from
Oct 25, 2024

Conversation

skatrak
Copy link
Contributor

@skatrak skatrak commented Oct 16, 2024

This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces llvm::Error and llvm::Expected objects to replace the values returned by callbacks passed to OMPIRBuilder codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace.

This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.

@Meinersbur
Copy link
Member

Meinersbur commented Oct 16, 2024

My naive view would be that once we start emitting code, other than non-recoverable errors such as bugs and non-yet-implemented aborts where the compiler should just crash. For descriptive messages for crashes, there is

class PrettyStackTraceEntry {

This is what adds

Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -mllvm --x86-asm-syntax=intel -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics <source>
1.	<eof> parser at end of file
2.	<source>:2:6: LLVM IR generation of declaration 'test'
3.	<source>:2:6: Generating code for declaration 'test'

to Clang crash reports. "failed region translation" (or more descriptive: "translating region 'X'") could be an entry added in OpenMPIRBuilder.

In other words, what recoverable error will this be used for? convertBlock not succeeding looks non-recoverable.

@skatrak
Copy link
Contributor Author

skatrak commented Oct 16, 2024

Thanks Michael for sharing your thoughts. I agree with you, in that I also think these errors are unrecoverable. The problem at the moment is that the OMPIRBuilder can't check whether the callbacks it calls have failed, so its codegen functions continue execution assuming the current state is valid. So, currently, when they fail for any of the reasons you mentioned (bugs, unsupported ops, etc) it can then dereference null pointers, trigger asserts later on, etc (see #109755 for an example).

So one solution to this would be what I understand you're proposing, which would be to print a crash report and exit the program directly from within these callbacks, so they never return. However, if you look everywhere else in the omp dialect to LLVM IR translation, you'll see that the expected method to handle errors like these there is to emit MLIR diagnostics and then return early with a failure() value (grep for 'emitError' to see many instances of this). This early exit propagates up the stack to the original caller to mlir::translateModuleToLLVMIR(), which then exits the program gracefully with an error code.

The solution I'm proposing is addressing long-standing TODO comments for supporting "error propagation in OpenMPIRBuilder" by:

  1. Allowing callbacks passed to the OMPIRBuilder to return errors, so that OMPIRBuilder codegen functions can stop early and not cause an unrelated crash.
  2. Allowing OMPIRBuilder codegen functions taking these callbacks to also return errors, so they can forward them to the caller.
  3. Taking errors forwarded by the OMPIRBuilder to produce MLIR diagnostics, and follow the same error reporting approach when the error is related to the OMPIRBuilder and when it isn't.

@Meinersbur
Copy link
Member

I think continuing on an unrecoverable error is an inherently bad idea. It will create code paths that will never be tested by definition (because if there is a path we will fix the bug leading to that path) and hence likely cause dereferenced null pointers, trigger assertions, etc already because the LLVM generated so far is not in an expected state. For instance, the new early abort in createTaskgroup may leave a degenerate BB, any call to BB::getTerminator() will return nullptr, and at latest the IR-verifier will crash. Any llvm_unreachable and assert in OpenMPIRBuilder and elsewhere will also still cause a crash, not mlir::emitError which seems inconsequential.

However, of course I will not stand in the way if this is the community-chosen solution and/or established pattern. Also, if there is use case with a recoverable error we would need this as well.

@tblah
Copy link
Contributor

tblah commented Oct 17, 2024

I don't feel strongly about this, but I think the proposed solution looks like a step forward. I think it is important that we do error out as soon as possible after an unrecoverable error and I think this is a cleaner way to do that than setting captured variables like bodyGenStatus.

I think the value in having some controlled error reporting like this over an assertion is for cases where implementation is still in progress. This allows us to print something more user friendly than a segfault/stack trace. As a reviewer I would read something like these errors as "this is a known limitation or error condition" whereas an assertion as "this is assumed never to happen".

@Meinersbur
Copy link
Member

Meinersbur commented Oct 17, 2024

I might have overblown the problem. If with every Error/Expected we immediately return from the entire call stack, until there is no use of the LLVM-IR, there should not be a lot of things that could go wrong. Clang's use of OpenMPIRBuilder still needs to crash on an erroneous OpenMPIRBuilder call because it doesn't have that Error/Expected design.

MLIR uses Rust-style error propagation, and since OpenMPIRBuilder is used within MLIR, it should be compatible with it. I am still not convinced that it provides any benefit, but consider my objections retracted.

@skatrak
Copy link
Contributor Author

skatrak commented Oct 17, 2024

I think continuing on an unrecoverable error is an inherently bad idea. It will create code paths that will never be tested by definition (because if there is a path we will fix the bug leading to that path) and hence likely cause dereferenced null pointers, trigger assertions, etc already because the LLVM generated so far is not in an expected state. For instance, the new early abort in createTaskgroup may leave a degenerate BB, any call to BB::getTerminator() will return nullptr, and at latest the IR-verifier will crash. Any llvm_unreachable and assert in OpenMPIRBuilder and elsewhere will also still cause a crash, not mlir::emitError which seems inconsequential.

However, of course I will not stand in the way if this is the community-chosen solution and/or established pattern. Also, if there is use case with a recoverable error we would need this as well.

I agree that once one of these errors were triggered, we'd have to stop any further translation or codegen process. But the approach I'm proposing actually does this. There is indeed some level of collaboration needed at each stage for that to happen, but basically the pattern is already to return LogicalResult from translation functions, always check its return value and return a failure straight away if it didn't succeed. So, once an OMPIRBuilder function fails, we can emit an error and return failure. This is checked by the caller, the caller of the caller, etc. until the original mlir::translateModuleToLLVMIR() call returns a null pointer instead of a half-created invalid IR.

I might have overblown the problem. If with every Error/Expected we immediately return from the entire call stack, until there is no use of the LLVM-IR, there should not be a lot of things that could go wrong. Clang's use of OpenMPIRBuilder still needs to crash on an erroneous OpenMPIRBuilder call because it doesn't have that Error/Expected design.

MLIR uses Rust-style error propagation, and since OpenMPIRBuilder is used within MLIR, it should be compatible with it. I am still not convinced that it provides any benefit, but consider my objections retracted.

In a way, the changes to the OMPIRBuilder are not really mandatory to make use of. Callbacks passed to it from Clang can decide to exit the program on irrecoverable failure and always return success, and assume the returned Expected values never hold an Error. In MLIR this lets us integrate with the existing error reporting pattern, but it also creates the infrastructure to deal with recoverable errors in both Clang and Flang. I see it as a bit more flexible solution.

@skatrak skatrak force-pushed the omp-mlir-llvm-error-handling branch from 201a1fa to 24ab62d Compare October 21, 2024 12:55
@skatrak skatrak marked this pull request as ready for review October 21, 2024 12:55
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen mlir:llvm mlir mlir:openmp flang:openmp llvm:transforms clang:openmp OpenMP related changes to Clang labels Oct 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-flang-openmp

Author: Sergio Afonso (skatrak)

Changes

This is a small proof of concept showing an approach to communicate errors between MLIR to LLVM IR translation of the OpenMP dialect and the OMPIRBuilder. It only implements the approach for a single case, so it doesn't compile or run, since it's only intended to show how it could look like and discuss it before investing too much effort on a full implementation.

The main idea is to use llvm::Error objects returned by callbacks passed to OMPIRBuilder codegen functions that they can then check and forward back to the caller to avoid continuing after an error has been hit. The caller then emits an MLIR error diagnostic based on that and stops the translation process.

This should prevent encountering any unsupported operations or arguments, or any other unexpected error from resulting in a compiler crash. Instead, a descriptive error message is presented to users.


Patch is 222.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112533.diff

8 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+17-8)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+4-2)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+57-19)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+129-120)
  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+376-215)
  • (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+15-8)
  • (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+413-189)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+352-266)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 3747b00d4893ad..92f2c6904f39de 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1192,6 +1192,7 @@ struct PushAndPopStackRAII {
       CodeGenFunction::JumpDest Dest =
           CGF.getOMPCancelDestination(OMPD_parallel);
       CGF.EmitBranchThroughCleanup(Dest);
+      return llvm::Error::success();
     };
 
     // TODO: Remove this once we emit parallel regions through the
@@ -2331,8 +2332,10 @@ void CGOpenMPRuntime::emitBarrierCall(CodeGenFunction &CGF, SourceLocation Loc,
   auto *OMPRegionInfo =
       dyn_cast_or_null<CGOpenMPRegionInfo>(CGF.CapturedStmtInfo);
   if (CGF.CGM.getLangOpts().OpenMPIRBuilder) {
-    CGF.Builder.restoreIP(OMPBuilder.createBarrier(
-        CGF.Builder, Kind, ForceSimpleCall, EmitChecks));
+    auto Result = OMPBuilder.createBarrier(CGF.Builder, Kind, ForceSimpleCall,
+                                           EmitChecks);
+    assert(Result && "unexpected error creating barrier");
+    CGF.Builder.restoreIP(*Result);
     return;
   }
 
@@ -5928,8 +5931,10 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
         return CGF.GenerateOpenMPCapturedStmtFunction(CS, D.getBeginLoc());
       };
 
-  OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction,
-                                      IsOffloadEntry, OutlinedFn, OutlinedFnID);
+  llvm::Error Err = OMPBuilder.emitTargetRegionFunction(
+      EntryInfo, GenerateOutlinedFunction, IsOffloadEntry, OutlinedFn,
+      OutlinedFnID);
+  assert(!Err && "unexpected error creating target region");
 
   if (!OutlinedFn)
     return;
@@ -9671,9 +9676,11 @@ static void emitTargetCallKernelLaunch(
         NumTargetItems, RTArgs, NumIterations, NumTeams, NumThreads,
         DynCGGroupMem, HasNoWait);
 
-    CGF.Builder.restoreIP(OMPRuntime->getOMPBuilder().emitKernelLaunch(
+    auto Result = OMPRuntime->getOMPBuilder().emitKernelLaunch(
         CGF.Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, Args,
-        DeviceID, RTLoc, AllocaIP));
+        DeviceID, RTLoc, AllocaIP);
+    assert(Result && "unexpected error creating kernel launch");
+    CGF.Builder.restoreIP(*Result);
   };
 
   if (RequiresOuterTask)
@@ -10350,9 +10357,11 @@ void CGOpenMPRuntime::emitTargetDataCalls(
   InsertPointTy CodeGenIP(CGF.Builder.GetInsertBlock(),
                           CGF.Builder.GetInsertPoint());
   llvm::OpenMPIRBuilder::LocationDescription OmpLoc(CodeGenIP);
-  CGF.Builder.restoreIP(OMPBuilder.createTargetData(
+  auto Result = OMPBuilder.createTargetData(
       OmpLoc, AllocaIP, CodeGenIP, DeviceID, IfCondVal, Info, GenMapInfoCB,
-      /*MapperFunc=*/nullptr, BodyCB, DeviceAddrCB, CustomMapperCB, RTLoc));
+      /*MapperFunc=*/nullptr, BodyCB, DeviceAddrCB, CustomMapperCB, RTLoc);
+  assert(Result && "unexpected error creating target data");
+  CGF.Builder.restoreIP(*Result);
 }
 
 void CGOpenMPRuntime::emitTargetDataStandAloneCall(
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 35ff75416cb776..62b8c5475a4ebe 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -1753,11 +1753,13 @@ void CGOpenMPRuntimeGPU::emitReduction(
     Idx++;
   }
 
-  CGF.Builder.restoreIP(OMPBuilder.createReductionsGPU(
+  auto Result = OMPBuilder.createReductionsGPU(
       OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction,
       DistributeReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang,
       CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum,
-      RTLoc));
+      RTLoc);
+  assert(Result && "unexpected error creating GPU reductions");
+  CGF.Builder.restoreIP(*Result);
   return;
 }
 
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 71a27d0c6bc1fb..0396f7cef89fc7 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1809,6 +1809,7 @@ void CodeGenFunction::EmitOMPParallelDirective(const OMPParallelDirective &S) {
     // thus calls destructors etc.
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     // Privatization callback that performs appropriate action for
@@ -1831,15 +1832,18 @@ void CodeGenFunction::EmitOMPParallelDirective(const OMPParallelDirective &S) {
                                InsertPointTy CodeGenIP) {
       OMPBuilderCBHelpers::EmitOMPOutlinedRegionBody(
           *this, ParallelRegionBodyStmt, AllocaIP, CodeGenIP, "parallel");
+      return llvm::Error::success();
     };
 
     CGCapturedStmtInfo CGSI(*CS, CR_OpenMP);
     CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI);
     llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
         AllocaInsertPt->getParent(), AllocaInsertPt->getIterator());
-    Builder.restoreIP(
+    auto Result =
         OMPBuilder.createParallel(Builder, AllocaIP, BodyGenCB, PrivCB, FiniCB,
-                                  IfCond, NumThreads, ProcBind, S.hasCancel()));
+                                  IfCond, NumThreads, ProcBind, S.hasCancel());
+    assert(Result && "unexpected error creating parallel");
+    Builder.restoreIP(*Result);
     return;
   }
 
@@ -2128,9 +2132,12 @@ void CodeGenFunction::EmitOMPCanonicalLoop(const OMPCanonicalLoop *S) {
 
     RunCleanupsScope BodyScope(*this);
     EmitStmt(BodyStmt);
+    return llvm::Error::success();
   };
-  llvm::CanonicalLoopInfo *CL =
-      OMPBuilder.createCanonicalLoop(Builder, BodyGen, DistVal);
+
+  auto Result = OMPBuilder.createCanonicalLoop(Builder, BodyGen, DistVal);
+  assert(Result && "unexpected error creating parallel");
+  llvm::CanonicalLoopInfo *CL = *Result;
 
   // Finish up the loop.
   Builder.restoreIP(CL->getAfterIP());
@@ -4016,11 +4023,12 @@ static void emitOMPForDirective(const OMPLoopDirective &S, CodeGenFunction &CGF,
           CGM.getOpenMPRuntime().getOMPBuilder();
       llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
           CGF.AllocaInsertPt->getParent(), CGF.AllocaInsertPt->getIterator());
-      OMPBuilder.applyWorkshareLoop(
+      auto Result = OMPBuilder.applyWorkshareLoop(
           CGF.Builder.getCurrentDebugLocation(), CLI, AllocaIP, NeedsBarrier,
           SchedKind, ChunkSize, /*HasSimdModifier=*/false,
           /*HasMonotonicModifier=*/false, /*HasNonmonotonicModifier=*/false,
           /*HasOrderedClause=*/false);
+      assert(Result && "unexpected error creating workshare loop");
       return;
     }
 
@@ -4257,6 +4265,7 @@ void CodeGenFunction::EmitOMPSectionsDirective(const OMPSectionsDirective &S) {
 
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     const CapturedStmt *ICS = S.getInnermostCapturedStmt();
@@ -4269,6 +4278,7 @@ void CodeGenFunction::EmitOMPSectionsDirective(const OMPSectionsDirective &S) {
                                          InsertPointTy CodeGenIP) {
           OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
               *this, SubStmt, AllocaIP, CodeGenIP, "section");
+          return llvm::Error::success();
         };
         SectionCBVector.push_back(SectionCB);
       }
@@ -4277,6 +4287,7 @@ void CodeGenFunction::EmitOMPSectionsDirective(const OMPSectionsDirective &S) {
                                             InsertPointTy CodeGenIP) {
         OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
             *this, CapturedStmt, AllocaIP, CodeGenIP, "section");
+        return llvm::Error::success();
       };
       SectionCBVector.push_back(SectionCB);
     }
@@ -4298,9 +4309,11 @@ void CodeGenFunction::EmitOMPSectionsDirective(const OMPSectionsDirective &S) {
     CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI);
     llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
         AllocaInsertPt->getParent(), AllocaInsertPt->getIterator());
-    Builder.restoreIP(OMPBuilder.createSections(
+    auto Result = OMPBuilder.createSections(
         Builder, AllocaIP, SectionCBVector, PrivCB, FiniCB, S.hasCancel(),
-        S.getSingleClause<OMPNowaitClause>()));
+        S.getSingleClause<OMPNowaitClause>());
+    assert(Result && "unexpected error creating sections");
+    Builder.restoreIP(*Result);
     return;
   }
   {
@@ -4326,17 +4339,21 @@ void CodeGenFunction::EmitOMPSectionDirective(const OMPSectionDirective &S) {
     const Stmt *SectionRegionBodyStmt = S.getAssociatedStmt();
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     auto BodyGenCB = [SectionRegionBodyStmt, this](InsertPointTy AllocaIP,
                                                    InsertPointTy CodeGenIP) {
       OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
           *this, SectionRegionBodyStmt, AllocaIP, CodeGenIP, "section");
+      return llvm::Error::success();
     };
 
     LexicalScope Scope(*this, S.getSourceRange());
     EmitStopPoint(&S);
-    Builder.restoreIP(OMPBuilder.createSection(Builder, BodyGenCB, FiniCB));
+    auto Result = OMPBuilder.createSection(Builder, BodyGenCB, FiniCB);
+    assert(Result && "unexpected error creating section");
+    Builder.restoreIP(*Result);
 
     return;
   }
@@ -4407,17 +4424,21 @@ void CodeGenFunction::EmitOMPMasterDirective(const OMPMasterDirective &S) {
 
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     auto BodyGenCB = [MasterRegionBodyStmt, this](InsertPointTy AllocaIP,
                                                   InsertPointTy CodeGenIP) {
       OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
           *this, MasterRegionBodyStmt, AllocaIP, CodeGenIP, "master");
+      return llvm::Error::success();
     };
 
     LexicalScope Scope(*this, S.getSourceRange());
     EmitStopPoint(&S);
-    Builder.restoreIP(OMPBuilder.createMaster(Builder, BodyGenCB, FiniCB));
+    auto Result = OMPBuilder.createMaster(Builder, BodyGenCB, FiniCB);
+    assert(Result && "unexpected error creating master");
+    Builder.restoreIP(*Result);
 
     return;
   }
@@ -4453,18 +4474,22 @@ void CodeGenFunction::EmitOMPMaskedDirective(const OMPMaskedDirective &S) {
 
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     auto BodyGenCB = [MaskedRegionBodyStmt, this](InsertPointTy AllocaIP,
                                                   InsertPointTy CodeGenIP) {
       OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
           *this, MaskedRegionBodyStmt, AllocaIP, CodeGenIP, "masked");
+      return llvm::Error::success();
     };
 
     LexicalScope Scope(*this, S.getSourceRange());
     EmitStopPoint(&S);
-    Builder.restoreIP(
-        OMPBuilder.createMasked(Builder, BodyGenCB, FiniCB, FilterVal));
+    auto Result =
+        OMPBuilder.createMasked(Builder, BodyGenCB, FiniCB, FilterVal);
+    assert(Result && "unexpected error creating masked");
+    Builder.restoreIP(*Result);
 
     return;
   }
@@ -4493,19 +4518,23 @@ void CodeGenFunction::EmitOMPCriticalDirective(const OMPCriticalDirective &S) {
 
     auto FiniCB = [this](InsertPointTy IP) {
       OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+      return llvm::Error::success();
     };
 
     auto BodyGenCB = [CriticalRegionBodyStmt, this](InsertPointTy AllocaIP,
                                                     InsertPointTy CodeGenIP) {
       OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
           *this, CriticalRegionBodyStmt, AllocaIP, CodeGenIP, "critical");
+      return llvm::Error::success();
     };
 
     LexicalScope Scope(*this, S.getSourceRange());
     EmitStopPoint(&S);
-    Builder.restoreIP(OMPBuilder.createCritical(
-        Builder, BodyGenCB, FiniCB, S.getDirectiveName().getAsString(),
-        HintInst));
+    auto Result =
+        OMPBuilder.createCritical(Builder, BodyGenCB, FiniCB,
+                                  S.getDirectiveName().getAsString(), HintInst);
+    assert(Result && "unexpected error creating critical");
+    Builder.restoreIP(*Result);
 
     return;
   }
@@ -5464,11 +5493,14 @@ void CodeGenFunction::EmitOMPTaskgroupDirective(
                                InsertPointTy CodeGenIP) {
       Builder.restoreIP(CodeGenIP);
       EmitStmt(S.getInnermostCapturedStmt()->getCapturedStmt());
+      return llvm::Error::success();
     };
     CodeGenFunction::CGCapturedStmtInfo CapStmtInfo;
     if (!CapturedStmtInfo)
       CapturedStmtInfo = &CapStmtInfo;
-    Builder.restoreIP(OMPBuilder.createTaskgroup(Builder, AllocaIP, BodyGenCB));
+    auto Result = OMPBuilder.createTaskgroup(Builder, AllocaIP, BodyGenCB);
+    assert(Result && "unexpected error creating taskgroup");
+    Builder.restoreIP(*Result);
     return;
   }
   auto &&CodeGen = [&S](CodeGenFunction &CGF, PrePostActionTy &Action) {
@@ -6041,6 +6073,7 @@ void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) {
 
       auto FiniCB = [this](InsertPointTy IP) {
         OMPBuilderCBHelpers::FinalizeOMPRegion(*this, IP);
+        return llvm::Error::success();
       };
 
       auto BodyGenCB = [&S, C, this](InsertPointTy AllocaIP,
@@ -6064,11 +6097,14 @@ void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) {
           OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
               *this, CS->getCapturedStmt(), AllocaIP, CodeGenIP, "ordered");
         }
+        return llvm::Error::success();
       };
 
       OMPLexicalScope Scope(*this, S, OMPD_unknown);
-      Builder.restoreIP(
-          OMPBuilder.createOrderedThreadsSimd(Builder, BodyGenCB, FiniCB, !C));
+      auto Result =
+          OMPBuilder.createOrderedThreadsSimd(Builder, BodyGenCB, FiniCB, !C);
+      assert(Result && "unexpected error creating ordered");
+      Builder.restoreIP(*Result);
     }
     return;
   }
@@ -7344,8 +7380,10 @@ void CodeGenFunction::EmitOMPCancelDirective(const OMPCancelDirective &S) {
       if (IfCond)
         IfCondition = EmitScalarExpr(IfCond,
                                      /*IgnoreResultAssign=*/true);
-      return Builder.restoreIP(
-          OMPBuilder.createCancel(Builder, IfCondition, S.getCancelRegion()));
+      auto Result =
+          OMPBuilder.createCancel(Builder, IfCondition, S.getCancelRegion());
+      assert(Result && "unexpected error creating cancel");
+      return Builder.restoreIP(*Result);
     }
   }
 
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 8834c3b1f50115..d846a885bccb4b 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -538,7 +538,7 @@ class OpenMPIRBuilder {
   /// A finalize callback knows about all objects that need finalization, e.g.
   /// destruction, when the scope of the currently generated construct is left
   /// at the time, and location, the callback is invoked.
-  using FinalizeCallbackTy = std::function<void(InsertPointTy CodeGenIP)>;
+  using FinalizeCallbackTy = std::function<Error(InsertPointTy CodeGenIP)>;
 
   struct FinalizationInfo {
     /// The finalization callback provided by the last in-flight invocation of
@@ -589,15 +589,19 @@ class OpenMPIRBuilder {
   ///                 not be split.
   /// \param CodeGenIP is the insertion point at which the body code should be
   ///                  placed.
+  ///
+  /// \return an error, if any were triggered during execution.
   using BodyGenCallbackTy =
-      function_ref<void(InsertPointTy AllocaIP, InsertPointTy CodeGenIP)>;
+      function_ref<Error(InsertPointTy AllocaIP, InsertPointTy CodeGenIP)>;
 
   // This is created primarily for sections construct as llvm::function_ref
   // (BodyGenCallbackTy) is not storable (as described in the comments of
   // function_ref class - function_ref contains non-ownable reference
   // to the callable.
+  ///
+  /// \return an error, if any were triggered during execution.
   using StorableBodyGenCallbackTy =
-      std::function<void(InsertPointTy AllocaIP, InsertPointTy CodeGenIP)>;
+      std::function<Error(InsertPointTy AllocaIP, InsertPointTy CodeGenIP)>;
 
   /// Callback type for loop body code generation.
   ///
@@ -607,8 +611,10 @@ class OpenMPIRBuilder {
   ///                  terminated with an unconditional branch to the loop
   ///                  latch.
   /// \param IndVar    is the induction variable usable at the insertion point.
+  ///
+  /// \return an error, if any were triggered during execution.
   using LoopBodyGenCallbackTy =
-      function_ref<void(InsertPointTy CodeGenIP, Value *IndVar)>;
+      function_ref<Error(InsertPointTy CodeGenIP, Value *IndVar)>;
 
   /// Callback type for variable privatization (think copy & default
   /// constructor).
@@ -626,9 +632,9 @@ class OpenMPIRBuilder {
   /// \param ReplVal The replacement value, thus a copy or new created version
   ///                of \p Inner.
   ///
-  /// \returns The new insertion point where code generation continues and
-  ///          \p ReplVal the replacement value.
-  using PrivatizeCallbackTy = function_ref<InsertPointTy(
+  /// \returns The new insertion point where code generation continues or an
+  ///          error, and \p ReplVal the replacement value.
+  using PrivatizeCallbackTy = function_ref<Expected<InsertPointTy>(
       InsertPointTy AllocaIP, InsertPointTy CodeGenIP, Value &Original,
       Value &Inner, Value *&ReplVal)>;
 
@@ -658,9 +664,10 @@ class OpenMPIRBuilder {
   /// \param ThreadID Optional parameter to pass in any existing ThreadID value.
   ///
   /// \returns The insertion point after the barrier.
-  InsertPointTy createBarrier(const LocationDescription &Loc,
-                              omp::Directive Kind, bool ForceSimpleCall = false,
-                              bool CheckCancelFlag = true);
+  Expected<InsertPointTy> createBarrier(const LocationDescription &Loc,
+                                        omp::Directive Kind,
+                                        bool ForceSimpleCall = false,
+                                        bool CheckCancelFlag = true);
 
   /// Generator for '#omp cancel'
   ///
@@ -669,8 +676,9 @@ class OpenMPIRBuilder {
   /// \param CanceledDirective The kind of directive that is cancled.
   ///
   /// \returns The insertion point after the barrier.
-  InsertPointTy createCancel(const LocationDescription &Loc, Value *IfCondition,
-                             omp::Directive CanceledDirective);
+  Expected<InsertPointTy> createCancel(const LocationDescription &Loc,
+                                       Value *IfCondition,
+                                       omp::Directive CanceledDirective);
 
   /// Generator for '#omp parallel'
   ///
@@ -685,7 +693,7 @@ class OpenMPIRBuilder {
   /// \param IsCancellable Flag to indicate a cancellable parallel region.
   ///
   /// \returns The insertion position *after* the parallel.
-  IRBuilder<>::InsertPoint
+  Expected<InsertPointTy>
   createParallel(const LocationDescription &Loc, InsertPointTy AllocaIP,
                  BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB,
                  FinalizeCallbackTy FiniCB, Value *IfCondition,
@@ -711,10 +719,10 @@ class OpenMPIRBuilder {
   ///
   /// \returns An object representing the created control flow structure which
   ///          can be used for loop-associated directives.
-  CanonicalLoopInfo *createCanonicalLoop(const LocationDescription &Loc,
-                                         LoopBodyGenCallbackTy BodyGenCB,
-                                         Value *TripCount,
-                                         const Twine &Name = "loop");
+  Expected<CanonicalLoopInfo *>
+  createCanonicalLoop(const LocationDescription &Loc,
+                      LoopBodyGenCallbackTy BodyGenCB, Value *TripCount,
+                      const Twine &Name = "loop");
 
   /// Generator for the control flow structure of an OpenMP canonical loop.
   ///
@@ -764,12 +772,10 @@ class OpenMPIRBuilder {
   ///
   /// \returns An object representing the created control flow structure which
   ///     ...
[truncated]

@skatrak skatrak changed the title [MLIR][OpenMP][OMPIRBuilder] Error propagation across callbacks [OpenMP][OMPIRBuilder] Error propagation across callbacks Oct 21, 2024
@skatrak skatrak force-pushed the omp-mlir-llvm-error-handling branch from 24ab62d to 573f816 Compare October 21, 2024 13:30
Copy link
Contributor

@bhandarkar-pranav bhandarkar-pranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR, @skatrak, a very useful change. I have minor nits but other than that, LGTM.

clang/lib/CodeGen/CGOpenMPRuntime.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGOpenMPRuntime.cpp Show resolved Hide resolved
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp Outdated Show resolved Hide resolved
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp Outdated Show resolved Hide resolved
@skatrak skatrak force-pushed the omp-mlir-llvm-error-handling branch from 573f816 to 8ce2262 Compare October 23, 2024 12:01
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected`
objects to replace the values returned by callbacks passed to `OMPIRBuilder`
codegen functions. These functions then check the result for errors when
callbacks are called and forward them back to the caller, which has the
flexibility to recover, exit cleanly or dump a stack trace.

This prevents a failed callback to leave the IR in an invalid state and still
continue the codegen process, triggering unrelated assertions or segmentation
faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this
change results in the compiler emitting errors and exiting early instead of
triggering a crash for not-yet-implemented errors. The behavior in Clang and
openmp-opt stays unchanged, since callbacks will continue always returning
'success'.
@skatrak skatrak force-pushed the omp-mlir-llvm-error-handling branch from 8ce2262 to 568964e Compare October 23, 2024 13:45
Copy link

github-actions bot commented Oct 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the extensive cleanup

Copy link
Contributor

@bhandarkar-pranav bhandarkar-pranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes.

@skatrak skatrak merged commit d87964d into llvm:main Oct 25, 2024
8 checks passed
@skatrak skatrak deleted the omp-mlir-llvm-error-handling branch October 25, 2024 10:30
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 25, 2024

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building clang,llvm,mlir at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/3878

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
OMPBuilder.createBarrier(CGF.Builder, Kind, ForceSimpleCall,
EmitChecks);
assert(AfterIP && "unexpected error creating barrier");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[serious] @skatrak assert is active by LLVM_ENABLED_ASSERTS, but llvm::Error/llvm::Expected uses LLVM_ABI_BREAKING_CHECKS. In the configuration -DLLVM_ENABLED_ASSERTS=OFF -DLLVM_ABI_BREAKING_CHECKS=FORCE_ON the error check is not done, but then crashes unconditional to about unchecked errors.

Copy link
Contributor Author

@skatrak skatrak Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess then these asserts should be replaced with if (!AfterIP) llvm_unreachable(...), or is there a better way to deal with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR to address this issue: #115863. There was indeed a cleaner solution to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category flang:openmp llvm:transforms mlir:llvm mlir:openmp mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants