-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NFC][flang][OpenMP] Extract target region utils to map or clone outside values #155754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesFollowing up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and Later Full diff: https://github.com/llvm/llvm-project/pull/155754.diff 3 Files Affected:
diff --git a/flang/include/flang/Utils/OpenMP.h b/flang/include/flang/Utils/OpenMP.h
index 28189ee6f4493..58e2f22fe9b61 100644
--- a/flang/include/flang/Utils/OpenMP.h
+++ b/flang/include/flang/Utils/OpenMP.h
@@ -11,6 +11,10 @@
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+namespace fir {
+class FirOpBuilder;
+} // namespace fir
+
namespace Fortran::utils::openmp {
// TODO We can probably move the stuff inside `Support/OpenMP-utils.h/.cpp` here
// as well.
@@ -28,6 +32,31 @@ mlir::omp::MapInfoOp createMapInfoOp(mlir::OpBuilder &builder,
uint64_t mapType, mlir::omp::VariableCaptureKind mapCaptureType,
mlir::Type retTy, bool partialMap = false,
mlir::FlatSymbolRefAttr mapperId = mlir::FlatSymbolRefAttr());
+
+/// For an mlir value that does not have storage, allocate temporary storage
+/// (outside the target region), store the value in that storage, and map the
+/// storage to the target region.
+///
+/// \param firOpBuilder - Operation builder.
+/// \param targetOp - Target op to which the temporary value is mapped.
+/// \param val - Temp value that should be mapped to the target region.
+/// \param name - A string used to identify the created `omp.map.info`
+/// op.
+///
+/// \returns The loaded mapped value inside the target region.
+mlir::Value mapTemporaryValue(fir::FirOpBuilder &firOpBuilder,
+ mlir::omp::TargetOp targetOp, mlir::Value val, llvm::StringRef name);
+
+/// For values used inside a target region but defined outside, either clone
+/// these value inside the target region or map them to the region. This
+/// function tries firts to clone values (if they are defined by
+/// memory-effect-free ops, otherwise, the values are mapped.
+///
+/// \param firOpBuilder - Operation builder.
+/// \param targetOp - The target that needs to be extended by clones and/or
+/// maps.
+void cloneOrMapRegionOutsiders(
+ fir::FirOpBuilder &firOpBuilder, mlir::omp::TargetOp targetOp);
} // namespace Fortran::utils::openmp
#endif // FORTRAN_UTILS_OPENMP_H_
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 9df2f1d79aa67..e43f31e5af1f2 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1431,104 +1431,7 @@ static void genBodyOfTargetOp(
// If so, then either clone them as well if they are MemoryEffectFree, or else
// copy them to a new temporary and add them to the map and block_argument
// lists and replace their uses with the new temporary.
- llvm::SetVector<mlir::Value> valuesDefinedAbove;
- mlir::getUsedValuesDefinedAbove(region, valuesDefinedAbove);
- while (!valuesDefinedAbove.empty()) {
- for (mlir::Value val : valuesDefinedAbove) {
- mlir::Operation *valOp = val.getDefiningOp();
-
- // NOTE: We skip BoxDimsOp's as the lesser of two evils is to map the
- // indices separately, as the alternative is to eventually map the Box,
- // which comes with a fairly large overhead comparatively. We could be
- // more robust about this and check using a BackwardsSlice to see if we
- // run the risk of mapping a box.
- if (valOp && mlir::isMemoryEffectFree(valOp) &&
- !mlir::isa<fir::BoxDimsOp>(valOp)) {
- mlir::Operation *clonedOp = valOp->clone();
- entryBlock->push_front(clonedOp);
-
- auto replace = [entryBlock](mlir::OpOperand &use) {
- return use.getOwner()->getBlock() == entryBlock;
- };
-
- valOp->getResults().replaceUsesWithIf(clonedOp->getResults(), replace);
- valOp->replaceUsesWithIf(clonedOp, replace);
- } else {
- auto savedIP = firOpBuilder.getInsertionPoint();
-
- if (valOp)
- firOpBuilder.setInsertionPointAfter(valOp);
- else
- // This means val is a block argument
- firOpBuilder.setInsertionPoint(targetOp);
-
- auto copyVal =
- firOpBuilder.createTemporary(val.getLoc(), val.getType());
- firOpBuilder.createStoreWithConvert(copyVal.getLoc(), val, copyVal);
-
- fir::factory::AddrAndBoundsInfo info =
- fir::factory::getDataOperandBaseAddr(
- firOpBuilder, val, /*isOptional=*/false, val.getLoc());
- llvm::SmallVector<mlir::Value> bounds =
- fir::factory::genImplicitBoundsOps<mlir::omp::MapBoundsOp,
- mlir::omp::MapBoundsType>(
- firOpBuilder, info,
- hlfir::translateToExtendedValue(val.getLoc(), firOpBuilder,
- hlfir::Entity{val})
- .first,
- /*dataExvIsAssumedSize=*/false, val.getLoc());
-
- std::stringstream name;
- firOpBuilder.setInsertionPoint(targetOp);
-
- llvm::omp::OpenMPOffloadMappingFlags mapFlag =
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
- mlir::omp::VariableCaptureKind captureKind =
- mlir::omp::VariableCaptureKind::ByRef;
-
- mlir::Type eleType = copyVal.getType();
- if (auto refType =
- mlir::dyn_cast<fir::ReferenceType>(copyVal.getType()))
- eleType = refType.getElementType();
-
- if (fir::isa_trivial(eleType) || fir::isa_char(eleType)) {
- captureKind = mlir::omp::VariableCaptureKind::ByCopy;
- } else if (!fir::isa_builtin_cptr_type(eleType)) {
- mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;
- }
-
- mlir::Value mapOp = createMapInfoOp(
- firOpBuilder, copyVal.getLoc(), copyVal,
- /*varPtrPtr=*/mlir::Value{}, name.str(), bounds,
- /*members=*/llvm::SmallVector<mlir::Value>{},
- /*membersIndex=*/mlir::ArrayAttr{},
- static_cast<
- std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
- mapFlag),
- captureKind, copyVal.getType());
-
- // Get the index of the first non-map argument before modifying mapVars,
- // then append an element to mapVars and an associated entry block
- // argument at that index.
- unsigned insertIndex =
- argIface.getMapBlockArgsStart() + argIface.numMapBlockArgs();
- targetOp.getMapVarsMutable().append(mapOp);
- mlir::Value clonedValArg = region.insertArgument(
- insertIndex, copyVal.getType(), copyVal.getLoc());
-
- firOpBuilder.setInsertionPointToStart(entryBlock);
- auto loadOp = fir::LoadOp::create(firOpBuilder, clonedValArg.getLoc(),
- clonedValArg);
- val.replaceUsesWithIf(loadOp->getResult(0),
- [entryBlock](mlir::OpOperand &use) {
- return use.getOwner()->getBlock() == entryBlock;
- });
- firOpBuilder.setInsertionPoint(entryBlock, savedIP);
- }
- }
- valuesDefinedAbove.clear();
- mlir::getUsedValuesDefinedAbove(region, valuesDefinedAbove);
- }
+ cloneOrMapRegionOutsiders(firOpBuilder, targetOp);
// Insert dummy instruction to remember the insertion position. The
// marker will be deleted since there are not uses.
diff --git a/flang/lib/Utils/OpenMP.cpp b/flang/lib/Utils/OpenMP.cpp
index e1681e9c34872..2261912fec22f 100644
--- a/flang/lib/Utils/OpenMP.cpp
+++ b/flang/lib/Utils/OpenMP.cpp
@@ -8,10 +8,14 @@
#include "flang/Utils/OpenMP.h"
+#include "flang/Lower/ConvertExprToHLFIR.h"
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
#include "flang/Optimizer/Dialect/FIROps.h"
#include "flang/Optimizer/Dialect/FIRType.h"
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/Transforms/RegionUtils.h"
namespace Fortran::utils::openmp {
mlir::omp::MapInfoOp createMapInfoOp(mlir::OpBuilder &builder,
@@ -44,4 +48,113 @@ mlir::omp::MapInfoOp createMapInfoOp(mlir::OpBuilder &builder,
builder.getStringAttr(name), builder.getBoolAttr(partialMap));
return op;
}
+
+mlir::Value mapTemporaryValue(fir::FirOpBuilder &firOpBuilder,
+ mlir::omp::TargetOp targetOp, mlir::Value val, llvm::StringRef name) {
+ mlir::OpBuilder::InsertionGuard guard(firOpBuilder);
+ mlir::Operation *valOp = val.getDefiningOp();
+
+ if (valOp)
+ firOpBuilder.setInsertionPointAfter(valOp);
+ else
+ // This means val is a block argument
+ firOpBuilder.setInsertionPoint(targetOp);
+
+ auto copyVal = firOpBuilder.createTemporary(val.getLoc(), val.getType());
+ firOpBuilder.createStoreWithConvert(copyVal.getLoc(), val, copyVal);
+
+ fir::factory::AddrAndBoundsInfo info = fir::factory::getDataOperandBaseAddr(
+ firOpBuilder, val, /*isOptional=*/false, val.getLoc());
+ llvm::SmallVector<mlir::Value> bounds =
+ fir::factory::genImplicitBoundsOps<mlir::omp::MapBoundsOp,
+ mlir::omp::MapBoundsType>(firOpBuilder, info,
+ hlfir::translateToExtendedValue(
+ val.getLoc(), firOpBuilder, hlfir::Entity{val})
+ .first,
+ /*dataExvIsAssumedSize=*/false, val.getLoc());
+
+ firOpBuilder.setInsertionPoint(targetOp);
+
+ llvm::omp::OpenMPOffloadMappingFlags mapFlag =
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
+ mlir::omp::VariableCaptureKind captureKind =
+ mlir::omp::VariableCaptureKind::ByRef;
+
+ mlir::Type eleType = copyVal.getType();
+ if (auto refType = mlir::dyn_cast<fir::ReferenceType>(copyVal.getType())) {
+ eleType = refType.getElementType();
+ }
+
+ if (fir::isa_trivial(eleType) || fir::isa_char(eleType)) {
+ captureKind = mlir::omp::VariableCaptureKind::ByCopy;
+ } else if (!fir::isa_builtin_cptr_type(eleType)) {
+ mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;
+ }
+
+ mlir::Value mapOp = createMapInfoOp(firOpBuilder, copyVal.getLoc(), copyVal,
+ /*varPtrPtr=*/mlir::Value{}, name.str(), bounds,
+ /*members=*/llvm::SmallVector<mlir::Value>{},
+ /*membersIndex=*/mlir::ArrayAttr{},
+ static_cast<std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
+ mapFlag),
+ captureKind, copyVal.getType());
+
+ auto argIface = llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*targetOp);
+ mlir::Region ®ion = targetOp.getRegion();
+
+ // Get the index of the first non-map argument before modifying mapVars,
+ // then append an element to mapVars and an associated entry block
+ // argument at that index.
+ unsigned insertIndex =
+ argIface.getMapBlockArgsStart() + argIface.numMapBlockArgs();
+ targetOp.getMapVarsMutable().append(mapOp);
+ mlir::Value clonedValArg =
+ region.insertArgument(insertIndex, copyVal.getType(), copyVal.getLoc());
+
+ mlir::Block *entryBlock = ®ion.getBlocks().front();
+ firOpBuilder.setInsertionPointToStart(entryBlock);
+ auto loadOp =
+ firOpBuilder.create<fir::LoadOp>(clonedValArg.getLoc(), clonedValArg);
+ return loadOp.getResult();
+}
+
+void cloneOrMapRegionOutsiders(
+ fir::FirOpBuilder &firOpBuilder, mlir::omp::TargetOp targetOp) {
+ mlir::Region ®ion = targetOp.getRegion();
+ mlir::Block *entryBlock = ®ion.getBlocks().front();
+
+ llvm::SetVector<mlir::Value> valuesDefinedAbove;
+ mlir::getUsedValuesDefinedAbove(region, valuesDefinedAbove);
+ while (!valuesDefinedAbove.empty()) {
+ for (mlir::Value val : valuesDefinedAbove) {
+ mlir::Operation *valOp = val.getDefiningOp();
+
+ // NOTE: We skip BoxDimsOp's as the lesser of two evils is to map the
+ // indices separately, as the alternative is to eventually map the Box,
+ // which comes with a fairly large overhead comparatively. We could be
+ // more robust about this and check using a BackwardsSlice to see if we
+ // run the risk of mapping a box.
+ if (valOp && mlir::isMemoryEffectFree(valOp) &&
+ !mlir::isa<fir::BoxDimsOp>(valOp)) {
+ mlir::Operation *clonedOp = valOp->clone();
+ entryBlock->push_front(clonedOp);
+
+ auto replace = [entryBlock](mlir::OpOperand &use) {
+ return use.getOwner()->getBlock() == entryBlock;
+ };
+
+ valOp->getResults().replaceUsesWithIf(clonedOp->getResults(), replace);
+ valOp->replaceUsesWithIf(clonedOp, replace);
+ } else {
+ mlir::Value mappedTemp = mapTemporaryValue(firOpBuilder, targetOp, val,
+ /*name=*/{});
+ val.replaceUsesWithIf(mappedTemp, [entryBlock](mlir::OpOperand &use) {
+ return use.getOwner()->getBlock() == entryBlock;
+ });
+ }
+ }
+ valuesDefinedAbove.clear();
+ mlir::getUsedValuesDefinedAbove(region, valuesDefinedAbove);
+ }
+}
} // namespace Fortran::utils::openmp
|
61d6ffa to
576a68b
Compare
1576d6f to
4f3924f
Compare
|
Ping! Please have a look when you have time. |
4f3924f to
c9be071
Compare
bhandarkar-pranav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits.
flang/include/flang/Utils/OpenMP.h
Outdated
|
|
||
| /// For values used inside a target region but defined outside, either clone | ||
| /// these value inside the target region or map them to the region. This | ||
| /// function tries firts to clone values (if they are defined by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: typo and ordering of words - "first tries to".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
flang/include/flang/Utils/OpenMP.h
Outdated
| /// | ||
| /// \returns The loaded mapped value inside the target region. | ||
| mlir::Value mapTemporaryValue(fir::FirOpBuilder &firOpBuilder, | ||
| mlir::omp::TargetOp targetOp, mlir::Value val, llvm::StringRef name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: Can name be a default parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…ide values Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will use these utils.
c9be071 to
ead5548
Compare
… clone outside values (#155754) Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will also use these utils. PR stack: - llvm/llvm-project#155754◀️ - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#156589 - llvm/llvm-project#156610 - llvm/llvm-project#156837
…#155987) Upstreams further parts of `do concurrent` to OpenMP conversion pass from AMD's fork. This PR extends the pass by adding support for mapping to the device. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987◀️ - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
… tests (#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992◀️ - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
…nMP mapping (#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993◀️ - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - #155754 - #155987 - #155992 - #155993 - #157638◀️ - #156610 - #156837
… (#157638) Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638◀️ - llvm/llvm-project#156610 - llvm/llvm-project#156837
Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - #155754 - #155987 - #155992 - #155993 - #157638 - #156610◀️ - #156837
…e (#156610) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610◀️ - llvm/llvm-project#156837
…ions on the GPU (#156837) Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837◀️
Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and
do concurrentconversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside.Later
do concurrentPR(s) will also use these utils.PR stack:
do concurrentmapping to device #155987do concurrentto device mapping lit tests #155992do concurrent: supportlocalon device #157638do concurrent: supportreduceon device #156610