Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops #113208

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

bhandarkar-pranav
Copy link
Contributor

This PR adds support to translate the private clause from MLIR to LLVMIR when used on allocatables in the context of an omp.target op.

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support to translate the private clause from MLIR to LLVMIR when used on allocatables in the context of an omp.target op.


Patch is 34.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113208.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+128-8)
  • (added) mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir (+116)
  • (added) mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir (+207)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-private.mlir (+90)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7c45e89cd8ac4b..2e013053a7d1c0 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3316,7 +3316,51 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
 
   return builder.saveIP();
 }
+static bool privatizerNeedsMap(omp::PrivateClauseOp &privatizer) {
+  Region &allocRegion = privatizer.getAllocRegion();
+  Value blockArg0 = allocRegion.getArgument(0);
+  return !blockArg0.use_empty();
+}
+
+// Return the llvm::Value * corresponding to the privateVar that
+// is being privatized. It isn't always as simple as looking up
+// moduleTranslation with privateVar. For instance, in case of
+// an allocatable, the descriptor for the allocatable is privatized.
+// This descriptor is mapped using an MapInfoOp. So, this function
+// will return a pointer to the llvm::Value corresponding to the
+// block argument for the mapped descriptor.
+static llvm::Value *
+findHostAssociatedValue(Value privateVar, omp::TargetOp targetOp,
+                        llvm::DenseMap<Value, int> &mappedPrivateVars,
+                        llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  if (mappedPrivateVars.contains(privateVar)) {
+    int blockArgIndex = mappedPrivateVars[privateVar];
+    Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
+    mlir::Type privVarType = privateVar.getType();
+    mlir::Type blockArgType = blockArg.getType();
+    assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
+           "A block argument corresponding to a mapped var should have "
+           "!llvm.ptr type");
+
+    if (privVarType == blockArg.getType()) {
+      llvm::Value *v = moduleTranslation.lookupValue(blockArg);
+      return v;
+    }
 
+    if (!isa<LLVM::LLVMPointerType>(privVarType)) {
+      // This typically happens when the privatized type is lowered from
+      // boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
+      // struct/pair is passed by value. But, mapped values are passed only as
+      // pointers, so before we privatize, we must load the pointer.
+      llvm::Value *load =
+          builder.CreateLoad(moduleTranslation.convertType(privVarType),
+                             moduleTranslation.lookupValue(blockArg));
+      return load;
+    }
+  }
+  return moduleTranslation.lookupValue(privateVar);
+}
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3329,6 +3373,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
   auto targetOp = cast<omp::TargetOp>(opInst);
   auto &targetRegion = targetOp.getRegion();
+  // Holds the private vars that have been mapped along with
+  // the block argument that corresponds to the MapInfoOp
+  // corresponding to the private var in question.
+  // So, for instance
+  //
+  // %10 = omp.map.info var_ptr(%6#0 : !fir.ref<!fir.box<!fir.heap<i32>>>, ..)
+  // omp.target map_entries(%10 -> %arg0) private(@box.privatizer %6#0-> %arg1)
+  //
+  // Then, %10 has been created so that the descriptor can be used by the
+  // privatizer
+  // @box.privatizer on the device side. Here we'd record {%6#0, 0} in the
+  // mappedPrivateVars map.
+  llvm::DenseMap<Value, int> mappedPrivateVars;
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
   ArrayRef<BlockArgument> mapBlockArgs =
@@ -3340,6 +3397,48 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   bool isOffloadEntry =
       isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
 
+  // For some private variables, the MapsForPrivatizedVariablesPass
+  // creates MapInfoOp instances. Go through the private variables and
+  // the mapped variables so that during codegeneration we are able
+  // to quickly look up the corresponding map variable, if any for each
+  // private variable.
+  if (!targetOp.getPrivateVars().empty() && !targetOp.getMapVars().empty()) {
+    auto argIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+    unsigned lastMapBlockArgsIdx =
+        argIface.getMapBlockArgsStart() + argIface.numMapBlockArgs() - 1;
+    OperandRange privateVars = targetOp.getPrivateVars();
+    std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
+    auto reverseIt = mapVars.rbegin();
+    for (auto [privVar, privSym] :
+         llvm::reverse(llvm::zip_equal(privateVars, *privateSyms))) {
+      SymbolRefAttr privatizerName = llvm::cast<SymbolRefAttr>(privSym);
+      omp::PrivateClauseOp privatizer =
+          findPrivatizer(targetOp, privatizerName);
+      if (!privatizerNeedsMap(privatizer))
+        continue;
+
+      // The MapInfoOp defining the map var isn't really needed later.
+      // So, we don't store it in any datastructure. Instead, we just
+      // do some sanity checks on it right now.
+      omp::MapInfoOp mapInfoOp =
+          llvm::cast<omp::MapInfoOp>((*reverseIt).getDefiningOp());
+      Type varType = mapInfoOp.getVarType();
+
+      // Check #1: Check that the type of the private variable matches
+      // the type of the variable being mapped.
+      if (!isa<LLVM::LLVMPointerType>(privVar.getType()))
+        assert(
+            varType == privVar.getType() &&
+            "Type of private var doesn't match the type of the mapped value");
+
+      // Ok, only 1 sanity check for now.
+      // Record the index of the block argument corresponding to this
+      // mapvar.
+      mappedPrivateVars.insert({privVar, lastMapBlockArgsIdx});
+      lastMapBlockArgsIdx--;
+      reverseIt++;
+    }
+  }
   LogicalResult bodyGenStatus = success();
   using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
   auto bodyCB = [&](InsertPointTy allocaIP,
@@ -3367,9 +3466,10 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
           moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
       moduleTranslation.mapValue(arg, mapOpValue);
     }
-
     // Do privatization after moduleTranslation has already recorded
     // mapped values.
+    SmallVector<llvm::Value *> llvmPrivateVars;
+    SmallVector<Region *> privateCleanupRegions;
     if (!targetOp.getPrivateVars().empty()) {
       builder.restoreIP(allocaIP);
 
@@ -3384,16 +3484,18 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
         SymbolRefAttr privSym = cast<SymbolRefAttr>(privatizerNameAttr);
         omp::PrivateClauseOp privatizer = findPrivatizer(&opInst, privSym);
         if (privatizer.getDataSharingType() ==
-                omp::DataSharingClauseType::FirstPrivate ||
-            !privatizer.getDeallocRegion().empty()) {
+            omp::DataSharingClauseType::FirstPrivate) {
           opInst.emitError("Translation of omp.target from MLIR to LLVMIR "
-                           "failed because translation of firstprivate and "
-                           " private allocatables is not supported yet");
+                           "failed because translation of firstprivate is not "
+                           "supported yet");
           bodyGenStatus = failure();
         } else {
-          moduleTranslation.mapValue(privatizer.getAllocMoldArg(),
-                                     moduleTranslation.lookupValue(privVar));
           Region &allocRegion = privatizer.getAllocRegion();
+          BlockArgument allocRegionArg = allocRegion.getArgument(0);
+          moduleTranslation.mapValue(
+              allocRegionArg,
+              findHostAssociatedValue(privVar, targetOp, mappedPrivateVars,
+                                      builder, moduleTranslation));
           SmallVector<llvm::Value *, 1> yieldedValues;
           if (failed(inlineConvertOmpRegions(
                   allocRegion, "omp.targetop.privatizer", builder,
@@ -3404,7 +3506,12 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
             bodyGenStatus = failure();
           } else {
             assert(yieldedValues.size() == 1);
-            moduleTranslation.mapValue(privBlockArg, yieldedValues.front());
+            llvm::Value *llvmReplacementValue = yieldedValues.front();
+            moduleTranslation.mapValue(privBlockArg, llvmReplacementValue);
+            if (!privatizer.getDeallocRegion().empty()) {
+              llvmPrivateVars.push_back(llvmReplacementValue);
+              privateCleanupRegions.push_back(&privatizer.getDeallocRegion());
+            }
           }
           moduleTranslation.forgetMapping(allocRegion);
           builder.restoreIP(builder.saveIP());
@@ -3414,6 +3521,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
     llvm::BasicBlock *exitBlock = convertOmpOpRegions(
         targetRegion, "omp.target", builder, moduleTranslation, bodyGenStatus);
     builder.SetInsertPoint(exitBlock);
+    if (!llvmPrivateVars.empty()) {
+      assert(llvmPrivateVars.size() == privateCleanupRegions.size() &&
+             "Number of private variables needing cleanup not equal to number"
+             "of privatizers with dealloc regions");
+      if (failed(inlineOmpRegionCleanup(
+              privateCleanupRegions, llvmPrivateVars, moduleTranslation,
+              builder, "omp.targetop.private.cleanup",
+              /*shouldLoadCleanupRegionArg=*/false))) {
+        opInst.emitError("failed to inline `dealloc` region of `omp.private` "
+                         "op in the target region");
+        bodyGenStatus = failure();
+      }
+    }
     return builder.saveIP();
   };
 
diff --git a/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
new file mode 100644
index 00000000000000..2491747152895c
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
@@ -0,0 +1,116 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer0 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+omp.private {type = private} @box.heap_privatizer1 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %6 = llvm.mlir.constant(1 : i64) : i64
+  %7 = llvm.alloca %6 x i32 {bindc_name = "mapped_var"} : (i64) -> !llvm.ptr
+  %13 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var0"} : (i64) -> !llvm.ptr
+  %14 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var1"} : (i64) -> !llvm.ptr
+  %53 = omp.map.info var_ptr(%7 : !llvm.ptr, i32) map_clauses(implicit, exit_release_or_enter_alloc) capture(ByCopy) -> !llvm.ptr {name = "mapped_var"}
+  %54 = omp.map.info var_ptr(%13 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  %55 = omp.map.info var_ptr(%14 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  omp.target map_entries(%53 -> %arg3, %54 -> %arg4, %55 ->%arg5 : !llvm.ptr, !llvm.ptr, !llvm.ptr) private(@box.heap_privatizer0 %13 -> %arg6, @box.heap_privatizer1 %14 -> %arg7 : !llvm.ptr, !llvm.ptr) {
+    %64 = llvm.mlir.constant(1 : i32) : i32
+    %65 = llvm.alloca %64 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+    %67 = llvm.alloca %64 x i32 : (i32) -> !llvm.ptr
+    %66 = llvm.mlir.constant(19 : i32) : i32
+    %68 = llvm.mlir.constant(18 : i32) : i32
+    %69 = llvm.mlir.constant(10 : i32) : i32
+    %70 = llvm.mlir.constant(5 : i32) : i32
+    llvm.store %70, %arg3 : i32, !llvm.ptr
+    llvm.store %69, %67 : i32, !llvm.ptr
+    %75 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+    %90 = llvm.insertvalue %67, %75[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+    llvm.store %90, %65 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+    %91 = llvm.mlir.zero : !llvm.ptr
+    %92 = llvm.call @_FortranAAssign(%arg6, %65, %91, %68) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    %93 = llvm.call @_FortranAAssign(%arg7, %65, %91, %66) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    omp.terminator
+  }
+  llvm.return
+}
+
+
+llvm.func @_FortranAAssign(!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()> attributes {fir.runtime, sym_visibility = "private"}
+
+// The first set of checks ensure that we are calling the offloaded function
+// with the right arguments, especially the second argument which needs to
+// be a memory reference to the descriptor for the privatized allocatable
+// CHECK: define void @target_allocatable_
+// CHECK-NOT: define internal void
+// CHECK: %[[DESC_ALLOC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: %[[DESC_ALLOC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: call void @__omp_offloading_[[OFFLOADED_FUNCTION:.*]](ptr {{[^,]+}},
+// CHECK-SAME: ptr %[[DESC_ALLOC0]], ptr %[[DESC_ALLOC1]])
+
+// CHECK: define internal void @__omp_offloading_[[OFFLOADED_FUNCTION]]
+// CHECK-SAME: (ptr {{[^,]+}}, ptr %[[DESCRIPTOR_ARG0:[^,]+]],
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1:.*]]) {
+// CHECK: %[[I0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG0]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG0:.*]] = load i64, ptr %[[I0]]
+// CHECK: %[[PRIV_DESC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR0:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG0]])
+// CHECK:  %[[TMP0:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR0]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP0]], ptr %[[PRIV_DESC0]]
+
+// CHECK: %[[I1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG1:.*]] = load i64, ptr %[[I1]]
+// CHECK: %[[PRIV_DESC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR1:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG1]])
+// CHECK:  %[[TMP1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR1]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP1]], ptr %[[PRIV_DESC1]]
+
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC0]]
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC1]]
+
+// CHECK:  %[[PTR0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC0]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF0:.*]] = load ptr, ptr %[[PTR0]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF0]])
+// CHECK:  %[[PTR1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC1]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF1:.*]] = load ptr, ptr %[[PTR1]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF1]])
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
new file mode 100644
index 00000000000000..6d65b25a7fbc01
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
@@ -0,0 +1,207 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var", pinned} : (i32) -> !llvm.ptr
+  %8 = llvm.mlir.constant(0 : i64) : i64
+  %10 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %11 = llvm.load %10 : !llvm.ptr -> !llvm.ptr
+  %12 = llvm.ptrtoint %11 : !llvm.ptr to i64
+  %13 = llvm.icmp "ne" %12, %8 : i64
+  llvm.cond_br %13, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  %14 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.ptrtoint %14 : !llvm.ptr to i64
+  %17 = llvm.call @malloc(%16) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb2:  // pred: ^bb0
+  %39 = llvm.mlir.zero : !llvm.ptr
+  %44 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %59 = llvm.insertvalue %39, %44[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %59, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb3:  // 2 preds: ^bb1, ^bb2
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  %10 = llvm.ptrtoint %9 : !llvm.ptr to i64
+  %11 = llvm.icmp "ne" %10, %6 : i64
+  llvm.cond_br %11, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  %15 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %17 = llvm.insertvalue %15, %16[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %17, %arg0 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb2
+^bb2:  // 2 preds: ^bb0, ^bb1
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i32) : i32
+  %3 = llvm.alloca %2 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  %5 = llvm.alloca %4 x f32 {bindc_name = "real_var"}...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/pr-subscribers-mlir-llvm

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support to translate the private clause from MLIR to LLVMIR when used on allocatables in the context of an omp.target op.


Patch is 34.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113208.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+128-8)
  • (added) mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir (+116)
  • (added) mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir (+207)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-private.mlir (+90)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7c45e89cd8ac4b..2e013053a7d1c0 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3316,7 +3316,51 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
 
   return builder.saveIP();
 }
+static bool privatizerNeedsMap(omp::PrivateClauseOp &privatizer) {
+  Region &allocRegion = privatizer.getAllocRegion();
+  Value blockArg0 = allocRegion.getArgument(0);
+  return !blockArg0.use_empty();
+}
+
+// Return the llvm::Value * corresponding to the privateVar that
+// is being privatized. It isn't always as simple as looking up
+// moduleTranslation with privateVar. For instance, in case of
+// an allocatable, the descriptor for the allocatable is privatized.
+// This descriptor is mapped using an MapInfoOp. So, this function
+// will return a pointer to the llvm::Value corresponding to the
+// block argument for the mapped descriptor.
+static llvm::Value *
+findHostAssociatedValue(Value privateVar, omp::TargetOp targetOp,
+                        llvm::DenseMap<Value, int> &mappedPrivateVars,
+                        llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  if (mappedPrivateVars.contains(privateVar)) {
+    int blockArgIndex = mappedPrivateVars[privateVar];
+    Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
+    mlir::Type privVarType = privateVar.getType();
+    mlir::Type blockArgType = blockArg.getType();
+    assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
+           "A block argument corresponding to a mapped var should have "
+           "!llvm.ptr type");
+
+    if (privVarType == blockArg.getType()) {
+      llvm::Value *v = moduleTranslation.lookupValue(blockArg);
+      return v;
+    }
 
+    if (!isa<LLVM::LLVMPointerType>(privVarType)) {
+      // This typically happens when the privatized type is lowered from
+      // boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
+      // struct/pair is passed by value. But, mapped values are passed only as
+      // pointers, so before we privatize, we must load the pointer.
+      llvm::Value *load =
+          builder.CreateLoad(moduleTranslation.convertType(privVarType),
+                             moduleTranslation.lookupValue(blockArg));
+      return load;
+    }
+  }
+  return moduleTranslation.lookupValue(privateVar);
+}
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3329,6 +3373,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
   auto targetOp = cast<omp::TargetOp>(opInst);
   auto &targetRegion = targetOp.getRegion();
+  // Holds the private vars that have been mapped along with
+  // the block argument that corresponds to the MapInfoOp
+  // corresponding to the private var in question.
+  // So, for instance
+  //
+  // %10 = omp.map.info var_ptr(%6#0 : !fir.ref<!fir.box<!fir.heap<i32>>>, ..)
+  // omp.target map_entries(%10 -> %arg0) private(@box.privatizer %6#0-> %arg1)
+  //
+  // Then, %10 has been created so that the descriptor can be used by the
+  // privatizer
+  // @box.privatizer on the device side. Here we'd record {%6#0, 0} in the
+  // mappedPrivateVars map.
+  llvm::DenseMap<Value, int> mappedPrivateVars;
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
   ArrayRef<BlockArgument> mapBlockArgs =
@@ -3340,6 +3397,48 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   bool isOffloadEntry =
       isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
 
+  // For some private variables, the MapsForPrivatizedVariablesPass
+  // creates MapInfoOp instances. Go through the private variables and
+  // the mapped variables so that during codegeneration we are able
+  // to quickly look up the corresponding map variable, if any for each
+  // private variable.
+  if (!targetOp.getPrivateVars().empty() && !targetOp.getMapVars().empty()) {
+    auto argIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+    unsigned lastMapBlockArgsIdx =
+        argIface.getMapBlockArgsStart() + argIface.numMapBlockArgs() - 1;
+    OperandRange privateVars = targetOp.getPrivateVars();
+    std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
+    auto reverseIt = mapVars.rbegin();
+    for (auto [privVar, privSym] :
+         llvm::reverse(llvm::zip_equal(privateVars, *privateSyms))) {
+      SymbolRefAttr privatizerName = llvm::cast<SymbolRefAttr>(privSym);
+      omp::PrivateClauseOp privatizer =
+          findPrivatizer(targetOp, privatizerName);
+      if (!privatizerNeedsMap(privatizer))
+        continue;
+
+      // The MapInfoOp defining the map var isn't really needed later.
+      // So, we don't store it in any datastructure. Instead, we just
+      // do some sanity checks on it right now.
+      omp::MapInfoOp mapInfoOp =
+          llvm::cast<omp::MapInfoOp>((*reverseIt).getDefiningOp());
+      Type varType = mapInfoOp.getVarType();
+
+      // Check #1: Check that the type of the private variable matches
+      // the type of the variable being mapped.
+      if (!isa<LLVM::LLVMPointerType>(privVar.getType()))
+        assert(
+            varType == privVar.getType() &&
+            "Type of private var doesn't match the type of the mapped value");
+
+      // Ok, only 1 sanity check for now.
+      // Record the index of the block argument corresponding to this
+      // mapvar.
+      mappedPrivateVars.insert({privVar, lastMapBlockArgsIdx});
+      lastMapBlockArgsIdx--;
+      reverseIt++;
+    }
+  }
   LogicalResult bodyGenStatus = success();
   using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
   auto bodyCB = [&](InsertPointTy allocaIP,
@@ -3367,9 +3466,10 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
           moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
       moduleTranslation.mapValue(arg, mapOpValue);
     }
-
     // Do privatization after moduleTranslation has already recorded
     // mapped values.
+    SmallVector<llvm::Value *> llvmPrivateVars;
+    SmallVector<Region *> privateCleanupRegions;
     if (!targetOp.getPrivateVars().empty()) {
       builder.restoreIP(allocaIP);
 
@@ -3384,16 +3484,18 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
         SymbolRefAttr privSym = cast<SymbolRefAttr>(privatizerNameAttr);
         omp::PrivateClauseOp privatizer = findPrivatizer(&opInst, privSym);
         if (privatizer.getDataSharingType() ==
-                omp::DataSharingClauseType::FirstPrivate ||
-            !privatizer.getDeallocRegion().empty()) {
+            omp::DataSharingClauseType::FirstPrivate) {
           opInst.emitError("Translation of omp.target from MLIR to LLVMIR "
-                           "failed because translation of firstprivate and "
-                           " private allocatables is not supported yet");
+                           "failed because translation of firstprivate is not "
+                           "supported yet");
           bodyGenStatus = failure();
         } else {
-          moduleTranslation.mapValue(privatizer.getAllocMoldArg(),
-                                     moduleTranslation.lookupValue(privVar));
           Region &allocRegion = privatizer.getAllocRegion();
+          BlockArgument allocRegionArg = allocRegion.getArgument(0);
+          moduleTranslation.mapValue(
+              allocRegionArg,
+              findHostAssociatedValue(privVar, targetOp, mappedPrivateVars,
+                                      builder, moduleTranslation));
           SmallVector<llvm::Value *, 1> yieldedValues;
           if (failed(inlineConvertOmpRegions(
                   allocRegion, "omp.targetop.privatizer", builder,
@@ -3404,7 +3506,12 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
             bodyGenStatus = failure();
           } else {
             assert(yieldedValues.size() == 1);
-            moduleTranslation.mapValue(privBlockArg, yieldedValues.front());
+            llvm::Value *llvmReplacementValue = yieldedValues.front();
+            moduleTranslation.mapValue(privBlockArg, llvmReplacementValue);
+            if (!privatizer.getDeallocRegion().empty()) {
+              llvmPrivateVars.push_back(llvmReplacementValue);
+              privateCleanupRegions.push_back(&privatizer.getDeallocRegion());
+            }
           }
           moduleTranslation.forgetMapping(allocRegion);
           builder.restoreIP(builder.saveIP());
@@ -3414,6 +3521,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
     llvm::BasicBlock *exitBlock = convertOmpOpRegions(
         targetRegion, "omp.target", builder, moduleTranslation, bodyGenStatus);
     builder.SetInsertPoint(exitBlock);
+    if (!llvmPrivateVars.empty()) {
+      assert(llvmPrivateVars.size() == privateCleanupRegions.size() &&
+             "Number of private variables needing cleanup not equal to number"
+             "of privatizers with dealloc regions");
+      if (failed(inlineOmpRegionCleanup(
+              privateCleanupRegions, llvmPrivateVars, moduleTranslation,
+              builder, "omp.targetop.private.cleanup",
+              /*shouldLoadCleanupRegionArg=*/false))) {
+        opInst.emitError("failed to inline `dealloc` region of `omp.private` "
+                         "op in the target region");
+        bodyGenStatus = failure();
+      }
+    }
     return builder.saveIP();
   };
 
diff --git a/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
new file mode 100644
index 00000000000000..2491747152895c
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
@@ -0,0 +1,116 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer0 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+omp.private {type = private} @box.heap_privatizer1 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %6 = llvm.mlir.constant(1 : i64) : i64
+  %7 = llvm.alloca %6 x i32 {bindc_name = "mapped_var"} : (i64) -> !llvm.ptr
+  %13 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var0"} : (i64) -> !llvm.ptr
+  %14 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var1"} : (i64) -> !llvm.ptr
+  %53 = omp.map.info var_ptr(%7 : !llvm.ptr, i32) map_clauses(implicit, exit_release_or_enter_alloc) capture(ByCopy) -> !llvm.ptr {name = "mapped_var"}
+  %54 = omp.map.info var_ptr(%13 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  %55 = omp.map.info var_ptr(%14 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  omp.target map_entries(%53 -> %arg3, %54 -> %arg4, %55 ->%arg5 : !llvm.ptr, !llvm.ptr, !llvm.ptr) private(@box.heap_privatizer0 %13 -> %arg6, @box.heap_privatizer1 %14 -> %arg7 : !llvm.ptr, !llvm.ptr) {
+    %64 = llvm.mlir.constant(1 : i32) : i32
+    %65 = llvm.alloca %64 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+    %67 = llvm.alloca %64 x i32 : (i32) -> !llvm.ptr
+    %66 = llvm.mlir.constant(19 : i32) : i32
+    %68 = llvm.mlir.constant(18 : i32) : i32
+    %69 = llvm.mlir.constant(10 : i32) : i32
+    %70 = llvm.mlir.constant(5 : i32) : i32
+    llvm.store %70, %arg3 : i32, !llvm.ptr
+    llvm.store %69, %67 : i32, !llvm.ptr
+    %75 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+    %90 = llvm.insertvalue %67, %75[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+    llvm.store %90, %65 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+    %91 = llvm.mlir.zero : !llvm.ptr
+    %92 = llvm.call @_FortranAAssign(%arg6, %65, %91, %68) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    %93 = llvm.call @_FortranAAssign(%arg7, %65, %91, %66) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    omp.terminator
+  }
+  llvm.return
+}
+
+
+llvm.func @_FortranAAssign(!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()> attributes {fir.runtime, sym_visibility = "private"}
+
+// The first set of checks ensure that we are calling the offloaded function
+// with the right arguments, especially the second argument which needs to
+// be a memory reference to the descriptor for the privatized allocatable
+// CHECK: define void @target_allocatable_
+// CHECK-NOT: define internal void
+// CHECK: %[[DESC_ALLOC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: %[[DESC_ALLOC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: call void @__omp_offloading_[[OFFLOADED_FUNCTION:.*]](ptr {{[^,]+}},
+// CHECK-SAME: ptr %[[DESC_ALLOC0]], ptr %[[DESC_ALLOC1]])
+
+// CHECK: define internal void @__omp_offloading_[[OFFLOADED_FUNCTION]]
+// CHECK-SAME: (ptr {{[^,]+}}, ptr %[[DESCRIPTOR_ARG0:[^,]+]],
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1:.*]]) {
+// CHECK: %[[I0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG0]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG0:.*]] = load i64, ptr %[[I0]]
+// CHECK: %[[PRIV_DESC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR0:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG0]])
+// CHECK:  %[[TMP0:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR0]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP0]], ptr %[[PRIV_DESC0]]
+
+// CHECK: %[[I1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG1:.*]] = load i64, ptr %[[I1]]
+// CHECK: %[[PRIV_DESC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR1:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG1]])
+// CHECK:  %[[TMP1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR1]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP1]], ptr %[[PRIV_DESC1]]
+
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC0]]
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC1]]
+
+// CHECK:  %[[PTR0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC0]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF0:.*]] = load ptr, ptr %[[PTR0]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF0]])
+// CHECK:  %[[PTR1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC1]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF1:.*]] = load ptr, ptr %[[PTR1]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF1]])
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
new file mode 100644
index 00000000000000..6d65b25a7fbc01
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
@@ -0,0 +1,207 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var", pinned} : (i32) -> !llvm.ptr
+  %8 = llvm.mlir.constant(0 : i64) : i64
+  %10 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %11 = llvm.load %10 : !llvm.ptr -> !llvm.ptr
+  %12 = llvm.ptrtoint %11 : !llvm.ptr to i64
+  %13 = llvm.icmp "ne" %12, %8 : i64
+  llvm.cond_br %13, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  %14 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.ptrtoint %14 : !llvm.ptr to i64
+  %17 = llvm.call @malloc(%16) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb2:  // pred: ^bb0
+  %39 = llvm.mlir.zero : !llvm.ptr
+  %44 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %59 = llvm.insertvalue %39, %44[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %59, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb3:  // 2 preds: ^bb1, ^bb2
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  %10 = llvm.ptrtoint %9 : !llvm.ptr to i64
+  %11 = llvm.icmp "ne" %10, %6 : i64
+  llvm.cond_br %11, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  %15 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %17 = llvm.insertvalue %15, %16[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %17, %arg0 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb2
+^bb2:  // 2 preds: ^bb0, ^bb1
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i32) : i32
+  %3 = llvm.alloca %2 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  %5 = llvm.alloca %4 x f32 {bindc_name = "real_var"}...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/pr-subscribers-mlir-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support to translate the private clause from MLIR to LLVMIR when used on allocatables in the context of an omp.target op.


Patch is 34.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113208.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+128-8)
  • (added) mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir (+116)
  • (added) mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir (+207)
  • (modified) mlir/test/Target/LLVMIR/openmp-target-private.mlir (+90)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 7c45e89cd8ac4b..2e013053a7d1c0 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3316,7 +3316,51 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg,
 
   return builder.saveIP();
 }
+static bool privatizerNeedsMap(omp::PrivateClauseOp &privatizer) {
+  Region &allocRegion = privatizer.getAllocRegion();
+  Value blockArg0 = allocRegion.getArgument(0);
+  return !blockArg0.use_empty();
+}
+
+// Return the llvm::Value * corresponding to the privateVar that
+// is being privatized. It isn't always as simple as looking up
+// moduleTranslation with privateVar. For instance, in case of
+// an allocatable, the descriptor for the allocatable is privatized.
+// This descriptor is mapped using an MapInfoOp. So, this function
+// will return a pointer to the llvm::Value corresponding to the
+// block argument for the mapped descriptor.
+static llvm::Value *
+findHostAssociatedValue(Value privateVar, omp::TargetOp targetOp,
+                        llvm::DenseMap<Value, int> &mappedPrivateVars,
+                        llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  if (mappedPrivateVars.contains(privateVar)) {
+    int blockArgIndex = mappedPrivateVars[privateVar];
+    Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
+    mlir::Type privVarType = privateVar.getType();
+    mlir::Type blockArgType = blockArg.getType();
+    assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
+           "A block argument corresponding to a mapped var should have "
+           "!llvm.ptr type");
+
+    if (privVarType == blockArg.getType()) {
+      llvm::Value *v = moduleTranslation.lookupValue(blockArg);
+      return v;
+    }
 
+    if (!isa<LLVM::LLVMPointerType>(privVarType)) {
+      // This typically happens when the privatized type is lowered from
+      // boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
+      // struct/pair is passed by value. But, mapped values are passed only as
+      // pointers, so before we privatize, we must load the pointer.
+      llvm::Value *load =
+          builder.CreateLoad(moduleTranslation.convertType(privVarType),
+                             moduleTranslation.lookupValue(blockArg));
+      return load;
+    }
+  }
+  return moduleTranslation.lookupValue(privateVar);
+}
 static LogicalResult
 convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
                  LLVM::ModuleTranslation &moduleTranslation) {
@@ -3329,6 +3373,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   auto parentFn = opInst.getParentOfType<LLVM::LLVMFuncOp>();
   auto targetOp = cast<omp::TargetOp>(opInst);
   auto &targetRegion = targetOp.getRegion();
+  // Holds the private vars that have been mapped along with
+  // the block argument that corresponds to the MapInfoOp
+  // corresponding to the private var in question.
+  // So, for instance
+  //
+  // %10 = omp.map.info var_ptr(%6#0 : !fir.ref<!fir.box<!fir.heap<i32>>>, ..)
+  // omp.target map_entries(%10 -> %arg0) private(@box.privatizer %6#0-> %arg1)
+  //
+  // Then, %10 has been created so that the descriptor can be used by the
+  // privatizer
+  // @box.privatizer on the device side. Here we'd record {%6#0, 0} in the
+  // mappedPrivateVars map.
+  llvm::DenseMap<Value, int> mappedPrivateVars;
   DataLayout dl = DataLayout(opInst.getParentOfType<ModuleOp>());
   SmallVector<Value> mapVars = targetOp.getMapVars();
   ArrayRef<BlockArgument> mapBlockArgs =
@@ -3340,6 +3397,48 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
   bool isOffloadEntry =
       isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
 
+  // For some private variables, the MapsForPrivatizedVariablesPass
+  // creates MapInfoOp instances. Go through the private variables and
+  // the mapped variables so that during codegeneration we are able
+  // to quickly look up the corresponding map variable, if any for each
+  // private variable.
+  if (!targetOp.getPrivateVars().empty() && !targetOp.getMapVars().empty()) {
+    auto argIface = llvm::cast<omp::BlockArgOpenMPOpInterface>(*targetOp);
+    unsigned lastMapBlockArgsIdx =
+        argIface.getMapBlockArgsStart() + argIface.numMapBlockArgs() - 1;
+    OperandRange privateVars = targetOp.getPrivateVars();
+    std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
+    auto reverseIt = mapVars.rbegin();
+    for (auto [privVar, privSym] :
+         llvm::reverse(llvm::zip_equal(privateVars, *privateSyms))) {
+      SymbolRefAttr privatizerName = llvm::cast<SymbolRefAttr>(privSym);
+      omp::PrivateClauseOp privatizer =
+          findPrivatizer(targetOp, privatizerName);
+      if (!privatizerNeedsMap(privatizer))
+        continue;
+
+      // The MapInfoOp defining the map var isn't really needed later.
+      // So, we don't store it in any datastructure. Instead, we just
+      // do some sanity checks on it right now.
+      omp::MapInfoOp mapInfoOp =
+          llvm::cast<omp::MapInfoOp>((*reverseIt).getDefiningOp());
+      Type varType = mapInfoOp.getVarType();
+
+      // Check #1: Check that the type of the private variable matches
+      // the type of the variable being mapped.
+      if (!isa<LLVM::LLVMPointerType>(privVar.getType()))
+        assert(
+            varType == privVar.getType() &&
+            "Type of private var doesn't match the type of the mapped value");
+
+      // Ok, only 1 sanity check for now.
+      // Record the index of the block argument corresponding to this
+      // mapvar.
+      mappedPrivateVars.insert({privVar, lastMapBlockArgsIdx});
+      lastMapBlockArgsIdx--;
+      reverseIt++;
+    }
+  }
   LogicalResult bodyGenStatus = success();
   using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
   auto bodyCB = [&](InsertPointTy allocaIP,
@@ -3367,9 +3466,10 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
           moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
       moduleTranslation.mapValue(arg, mapOpValue);
     }
-
     // Do privatization after moduleTranslation has already recorded
     // mapped values.
+    SmallVector<llvm::Value *> llvmPrivateVars;
+    SmallVector<Region *> privateCleanupRegions;
     if (!targetOp.getPrivateVars().empty()) {
       builder.restoreIP(allocaIP);
 
@@ -3384,16 +3484,18 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
         SymbolRefAttr privSym = cast<SymbolRefAttr>(privatizerNameAttr);
         omp::PrivateClauseOp privatizer = findPrivatizer(&opInst, privSym);
         if (privatizer.getDataSharingType() ==
-                omp::DataSharingClauseType::FirstPrivate ||
-            !privatizer.getDeallocRegion().empty()) {
+            omp::DataSharingClauseType::FirstPrivate) {
           opInst.emitError("Translation of omp.target from MLIR to LLVMIR "
-                           "failed because translation of firstprivate and "
-                           " private allocatables is not supported yet");
+                           "failed because translation of firstprivate is not "
+                           "supported yet");
           bodyGenStatus = failure();
         } else {
-          moduleTranslation.mapValue(privatizer.getAllocMoldArg(),
-                                     moduleTranslation.lookupValue(privVar));
           Region &allocRegion = privatizer.getAllocRegion();
+          BlockArgument allocRegionArg = allocRegion.getArgument(0);
+          moduleTranslation.mapValue(
+              allocRegionArg,
+              findHostAssociatedValue(privVar, targetOp, mappedPrivateVars,
+                                      builder, moduleTranslation));
           SmallVector<llvm::Value *, 1> yieldedValues;
           if (failed(inlineConvertOmpRegions(
                   allocRegion, "omp.targetop.privatizer", builder,
@@ -3404,7 +3506,12 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
             bodyGenStatus = failure();
           } else {
             assert(yieldedValues.size() == 1);
-            moduleTranslation.mapValue(privBlockArg, yieldedValues.front());
+            llvm::Value *llvmReplacementValue = yieldedValues.front();
+            moduleTranslation.mapValue(privBlockArg, llvmReplacementValue);
+            if (!privatizer.getDeallocRegion().empty()) {
+              llvmPrivateVars.push_back(llvmReplacementValue);
+              privateCleanupRegions.push_back(&privatizer.getDeallocRegion());
+            }
           }
           moduleTranslation.forgetMapping(allocRegion);
           builder.restoreIP(builder.saveIP());
@@ -3414,6 +3521,19 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
     llvm::BasicBlock *exitBlock = convertOmpOpRegions(
         targetRegion, "omp.target", builder, moduleTranslation, bodyGenStatus);
     builder.SetInsertPoint(exitBlock);
+    if (!llvmPrivateVars.empty()) {
+      assert(llvmPrivateVars.size() == privateCleanupRegions.size() &&
+             "Number of private variables needing cleanup not equal to number"
+             "of privatizers with dealloc regions");
+      if (failed(inlineOmpRegionCleanup(
+              privateCleanupRegions, llvmPrivateVars, moduleTranslation,
+              builder, "omp.targetop.private.cleanup",
+              /*shouldLoadCleanupRegionArg=*/false))) {
+        opInst.emitError("failed to inline `dealloc` region of `omp.private` "
+                         "op in the target region");
+        bodyGenStatus = failure();
+      }
+    }
     return builder.saveIP();
   };
 
diff --git a/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
new file mode 100644
index 00000000000000..2491747152895c
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-multiple-private.mlir
@@ -0,0 +1,116 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer0 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+omp.private {type = private} @box.heap_privatizer1 : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %10 = llvm.getelementptr %arg0[0, 1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %1 = llvm.load %10 : !llvm.ptr -> i64
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>  : (i32) -> !llvm.ptr
+  %17 = llvm.call @malloc(%1) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %6 = llvm.mlir.constant(1 : i64) : i64
+  %7 = llvm.alloca %6 x i32 {bindc_name = "mapped_var"} : (i64) -> !llvm.ptr
+  %13 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var0"} : (i64) -> !llvm.ptr
+  %14 = llvm.alloca %6 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var1"} : (i64) -> !llvm.ptr
+  %53 = omp.map.info var_ptr(%7 : !llvm.ptr, i32) map_clauses(implicit, exit_release_or_enter_alloc) capture(ByCopy) -> !llvm.ptr {name = "mapped_var"}
+  %54 = omp.map.info var_ptr(%13 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  %55 = omp.map.info var_ptr(%14 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(to) capture(ByRef) -> !llvm.ptr
+  omp.target map_entries(%53 -> %arg3, %54 -> %arg4, %55 ->%arg5 : !llvm.ptr, !llvm.ptr, !llvm.ptr) private(@box.heap_privatizer0 %13 -> %arg6, @box.heap_privatizer1 %14 -> %arg7 : !llvm.ptr, !llvm.ptr) {
+    %64 = llvm.mlir.constant(1 : i32) : i32
+    %65 = llvm.alloca %64 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+    %67 = llvm.alloca %64 x i32 : (i32) -> !llvm.ptr
+    %66 = llvm.mlir.constant(19 : i32) : i32
+    %68 = llvm.mlir.constant(18 : i32) : i32
+    %69 = llvm.mlir.constant(10 : i32) : i32
+    %70 = llvm.mlir.constant(5 : i32) : i32
+    llvm.store %70, %arg3 : i32, !llvm.ptr
+    llvm.store %69, %67 : i32, !llvm.ptr
+    %75 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+    %90 = llvm.insertvalue %67, %75[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+    llvm.store %90, %65 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+    %91 = llvm.mlir.zero : !llvm.ptr
+    %92 = llvm.call @_FortranAAssign(%arg6, %65, %91, %68) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    %93 = llvm.call @_FortranAAssign(%arg7, %65, %91, %66) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
+    omp.terminator
+  }
+  llvm.return
+}
+
+
+llvm.func @_FortranAAssign(!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()> attributes {fir.runtime, sym_visibility = "private"}
+
+// The first set of checks ensure that we are calling the offloaded function
+// with the right arguments, especially the second argument which needs to
+// be a memory reference to the descriptor for the privatized allocatable
+// CHECK: define void @target_allocatable_
+// CHECK-NOT: define internal void
+// CHECK: %[[DESC_ALLOC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: %[[DESC_ALLOC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1
+// CHECK: call void @__omp_offloading_[[OFFLOADED_FUNCTION:.*]](ptr {{[^,]+}},
+// CHECK-SAME: ptr %[[DESC_ALLOC0]], ptr %[[DESC_ALLOC1]])
+
+// CHECK: define internal void @__omp_offloading_[[OFFLOADED_FUNCTION]]
+// CHECK-SAME: (ptr {{[^,]+}}, ptr %[[DESCRIPTOR_ARG0:[^,]+]],
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1:.*]]) {
+// CHECK: %[[I0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG0]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG0:.*]] = load i64, ptr %[[I0]]
+// CHECK: %[[PRIV_DESC0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR0:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG0]])
+// CHECK:  %[[TMP0:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR0]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP0]], ptr %[[PRIV_DESC0]]
+
+// CHECK: %[[I1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[DESCRIPTOR_ARG1]], i32 0, i32 1
+// CHECK: %[[MALLOC_ARG1:.*]] = load i64, ptr %[[I1]]
+// CHECK: %[[PRIV_DESC1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK: %[[HEAP_PTR1:.*]] = call ptr @malloc(i64 %[[MALLOC_ARG1]])
+// CHECK:  %[[TMP1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8 }
+// CHECK-SAME: undef, ptr %[[HEAP_PTR1]], 0
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8 } %[[TMP1]], ptr %[[PRIV_DESC1]]
+
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC0]]
+// CHECK: call {} @_FortranAAssign(ptr %[[PRIV_DESC1]]
+
+// CHECK:  %[[PTR0:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC0]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF0:.*]] = load ptr, ptr %[[PTR0]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF0]])
+// CHECK:  %[[PTR1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 },
+// CHECK-SAME: ptr %[[PRIV_DESC1]], i32 0, i32 0
+// CHECK: %[[HEAP_MEMREF1:.*]] = load ptr, ptr %[[PTR1]]
+// CHECK:  call void @free(ptr %[[HEAP_MEMREF1]])
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
new file mode 100644
index 00000000000000..6d65b25a7fbc01
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir
@@ -0,0 +1,207 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+llvm.func @free(!llvm.ptr)
+llvm.func @malloc(i64) -> !llvm.ptr
+omp.private {type = private} @box.heap_privatizer : !llvm.ptr alloc {
+^bb0(%arg0: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %7 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "alloc_var", pinned} : (i32) -> !llvm.ptr
+  %8 = llvm.mlir.constant(0 : i64) : i64
+  %10 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %11 = llvm.load %10 : !llvm.ptr -> !llvm.ptr
+  %12 = llvm.ptrtoint %11 : !llvm.ptr to i64
+  %13 = llvm.icmp "ne" %12, %8 : i64
+  llvm.cond_br %13, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  %14 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.ptrtoint %14 : !llvm.ptr to i64
+  %17 = llvm.call @malloc(%16) {fir.must_be_heap = true, in_type = i32} : (i64) -> !llvm.ptr
+  %22 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %37 = llvm.insertvalue %17, %22[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %37, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb2:  // pred: ^bb0
+  %39 = llvm.mlir.zero : !llvm.ptr
+  %44 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %59 = llvm.insertvalue %39, %44[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %59, %7 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb3
+^bb3:  // 2 preds: ^bb1, ^bb2
+  omp.yield(%7 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  %6 = llvm.mlir.constant(0 : i64) : i64
+  %8 = llvm.getelementptr %arg0[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %9 = llvm.load %8 : !llvm.ptr -> !llvm.ptr
+  %10 = llvm.ptrtoint %9 : !llvm.ptr to i64
+  %11 = llvm.icmp "ne" %10, %6 : i64
+  llvm.cond_br %11, ^bb1, ^bb2
+^bb1:  // pred: ^bb0
+  llvm.call @free(%9) : (!llvm.ptr) -> ()
+  %15 = llvm.mlir.zero : !llvm.ptr
+  %16 = llvm.mlir.undef : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>
+  %17 = llvm.insertvalue %15, %16[0] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> 
+  llvm.store %17, %arg0 : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>, !llvm.ptr
+  llvm.br ^bb2
+^bb2:  // 2 preds: ^bb0, ^bb1
+  omp.yield
+}
+llvm.func @target_allocatable_(%arg0: !llvm.ptr {fir.bindc_name = "lb"}, %arg1: !llvm.ptr {fir.bindc_name = "ub"}, %arg2: !llvm.ptr {fir.bindc_name = "l"}) attributes {fir.internal_name = "_QPtarget_allocatable"} {
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i32) : i32
+  %3 = llvm.alloca %2 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  %5 = llvm.alloca %4 x f32 {bindc_name = "real_var"}...
[truncated]

Copy link
Contributor

@Dinistro Dinistro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped a few drive by comments.

Regarding the testing: Generating inputs with flang or clang is undesired, as this results in an unfocused test that is hard to maintain. These tests should contain as little unrelated stuff as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not generate test inputs from actual input programs. These tests are bloated with way too much unrelated stuff that make them a maintenance burden when changing anything that is not related to this.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some drive-by comments. It would be good if somebody more familiar with target stuff and mapping could take a look


// Check #1: Check that the type of the private variable matches
// the type of the variable being mapped.
if (!isa<LLVM::LLVMPointerType>(privVar.getType()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we assert the types match when the private variable is a pointer?

llvm::Value *llvmReplacementValue = yieldedValues.front();
moduleTranslation.mapValue(privBlockArg, llvmReplacementValue);
if (!privatizer.getDeallocRegion().empty()) {
llvmPrivateVars.push_back(llvmReplacementValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this should be called something like llvmPrivateVarsNeedingCleanup so it isn't confused with similarly named vectors in other directives that do contain all of the private variables

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pranav, one question about the approach and a few small nits.

Comment on lines +3337 to +3362
if (mappedPrivateVars.contains(privateVar)) {
int blockArgIndex = mappedPrivateVars[privateVar];
Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
mlir::Type privVarType = privateVar.getType();
mlir::Type blockArgType = blockArg.getType();
assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
"A block argument corresponding to a mapped var should have "
"!llvm.ptr type");

if (privVarType == blockArg.getType()) {
llvm::Value *v = moduleTranslation.lookupValue(blockArg);
return v;
}

if (!isa<LLVM::LLVMPointerType>(privVarType)) {
// This typically happens when the privatized type is lowered from
// boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
// struct/pair is passed by value. But, mapped values are passed only as
// pointers, so before we privatize, we must load the pointer.
llvm::Value *load =
builder.CreateLoad(moduleTranslation.convertType(privVarType),
moduleTranslation.lookupValue(blockArg));
return load;
}
}
return moduleTranslation.lookupValue(privateVar);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Early return, just to follow the code style.

Suggested change
if (mappedPrivateVars.contains(privateVar)) {
int blockArgIndex = mappedPrivateVars[privateVar];
Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
mlir::Type privVarType = privateVar.getType();
mlir::Type blockArgType = blockArg.getType();
assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
"A block argument corresponding to a mapped var should have "
"!llvm.ptr type");
if (privVarType == blockArg.getType()) {
llvm::Value *v = moduleTranslation.lookupValue(blockArg);
return v;
}
if (!isa<LLVM::LLVMPointerType>(privVarType)) {
// This typically happens when the privatized type is lowered from
// boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
// struct/pair is passed by value. But, mapped values are passed only as
// pointers, so before we privatize, we must load the pointer.
llvm::Value *load =
builder.CreateLoad(moduleTranslation.convertType(privVarType),
moduleTranslation.lookupValue(blockArg));
return load;
}
}
return moduleTranslation.lookupValue(privateVar);
if (!mappedPrivateVars.contains(privateVar))
return moduleTranslation.lookupValue(privateVar);
int blockArgIndex = mappedPrivateVars[privateVar];
Value blockArg = targetOp.getRegion().getArgument(blockArgIndex);
mlir::Type privVarType = privateVar.getType();
mlir::Type blockArgType = blockArg.getType();
assert(isa<LLVM::LLVMPointerType>(blockArgType) &&
"A block argument corresponding to a mapped var should have "
"!llvm.ptr type");
if (privVarType == blockArg.getType()) {
llvm::Value *v = moduleTranslation.lookupValue(blockArg);
return v;
}
if (!isa<LLVM::LLVMPointerType>(privVarType)) {
// This typically happens when the privatized type is lowered from
// boxchar<KIND> and gets lowered to !llvm.struct<(ptr, i64)>. That is the
// struct/pair is passed by value. But, mapped values are passed only as
// pointers, so before we privatize, we must load the pointer.
llvm::Value *load =
builder.CreateLoad(moduleTranslation.convertType(privVarType),
moduleTranslation.lookupValue(blockArg));
return load;
}

OperandRange privateVars = targetOp.getPrivateVars();
std::optional<ArrayAttr> privateSyms = targetOp.getPrivateSyms();
auto reverseIt = mapVars.rbegin();
for (auto [privVar, privSym] :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the following point should have been raised in an earlier PR probably, but I think this algorithms is a bit brittle since it always assumes the last mappings are the ones corresponding the privates vars. Can we instead record that info on the private op level? So each private op holds an optional MLIR value that corresponds to its map if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agonized over this scheme as I wasnt too happy about this. I like your suggestion and I'll work on it.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Oct 25, 2024
Comment on lines +3319 to +3329
/// privatizer is a PrivateClauseOp that privatizes an MLIR value.
/// privatizerNeedsMap returns true if the value being privatized in an
/// omp.target p should additionally be mapped to the target region
/// using a MapInfoOp. This is most common when an allocatable is privatized.
/// In such cases, the descriptor is use in privatization and needs to be
/// mapped on to the device.
// static bool privatizerNeedsMap(omp::PrivateClauseOp &privatizer) {
// Region &allocRegion = privatizer.getAllocRegion();
// Value blockArg0 = allocRegion.getArgument(0);
// return !blockArg0.use_empty();
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I'll take care of this as part of my next update.

@bhandarkar-pranav
Copy link
Contributor Author

Hey all, I was out for a few days but back in the office now until the 11th of Nov. I hope to address all comments on this PR before I leave for an extended break.

@ergawy
Copy link
Member

ergawy commented Nov 18, 2024

This PR will be replaced by #116576 since Pranav is out of office for some time. So I am taking this over. I will handle the current open comments on the new PR.

ergawy added a commit that referenced this pull request Dec 12, 2024
…tization of allocatables in `omp.target` ops (#116576)

This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces #113208.

Parent PR: #116770. Only the
latest commit is relevant to the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants