-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[flang][MLIR][OpenMP] Extend delayed privatization for scalar allocatables and pointers #84740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR replaces the previous overly complicated solution proposed in #84033. |
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesOne more step in extending support for delayed privatization. This diff adds support for scalar allocatables. Full diff: https://github.com/llvm/llvm-project/pull/84740.diff 5 Files Affected:
diff --git a/flang/include/flang/Optimizer/Builder/BoxValue.h b/flang/include/flang/Optimizer/Builder/BoxValue.h
index 2fed2d48a7a080..698a5afa17b23d 100644
--- a/flang/include/flang/Optimizer/Builder/BoxValue.h
+++ b/flang/include/flang/Optimizer/Builder/BoxValue.h
@@ -535,6 +535,16 @@ class ExtendedValue : public details::matcher<ExtendedValue> {
const VT &matchee() const { return box; }
+ /// Clone an ExtendedValue to a new instance changing the base address.
+ ///
+ /// TODO So far, this is only a shallow clone; only the base address is
+ /// replaced. This will probably be extended to implement deep cloning to
+ /// support scenarios such as delayed privatization for ArrayBoxValue's.
+ ExtendedValue clone(mlir::Value newBase) const {
+ return match([&](const UnboxedValue &box) -> ExtendedValue { return newBase; },
+ [&](const auto &box) -> ExtendedValue { return box.clone(newBase); });
+ }
+
private:
VT box;
};
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index a668ba4116faab..5fa914d087f020 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -748,7 +748,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
void copyVar(mlir::Location loc, mlir::Value dst,
mlir::Value src) override final {
- copyVarHLFIR(loc, dst, src);
+ copyVarHLFIR(loc, Fortran::lower::SymbolBox::Intrinsic{dst},
+ Fortran::lower::SymbolBox::Intrinsic{src});
}
void copyHostAssociateVar(
@@ -1009,10 +1010,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
// `omp.private`'s `alloc` block. If this is the case, we return this
// `SymbolBox::Intrinsic` value.
if (Fortran::lower::SymbolBox v = symMap->lookupSymbol(sym))
- return v.match(
- [&](const Fortran::lower::SymbolBox::Intrinsic &)
- -> Fortran::lower::SymbolBox { return v; },
- [](const auto &) -> Fortran::lower::SymbolBox { return {}; });
+ return v;
return {};
}
@@ -1060,15 +1058,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
const Fortran::lower::SymbolBox &rhs_sb) {
mlir::Location loc = genLocation(sym.name());
if (lowerToHighLevelFIR())
- copyVarHLFIR(loc, lhs_sb.getAddr(), rhs_sb.getAddr());
+ copyVarHLFIR(loc, lhs_sb, rhs_sb);
else
copyVarFIR(loc, sym, lhs_sb, rhs_sb);
}
- void copyVarHLFIR(mlir::Location loc, mlir::Value dst, mlir::Value src) {
+ void copyVarHLFIR(mlir::Location loc, Fortran::lower::SymbolBox dst,
+ Fortran::lower::SymbolBox src) {
assert(lowerToHighLevelFIR());
- hlfir::Entity lhs{dst};
- hlfir::Entity rhs{src};
+ hlfir::Entity lhs{dst.getAddr()};
+ hlfir::Entity rhs{src.getAddr()};
// Temporary_lhs is set to true in hlfir.assign below to avoid user
// assignment to be used and finalization to be called on the LHS.
// This may or may not be correct but mimics the current behaviour
@@ -1082,7 +1081,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
/*keepLhsLengthInAllocatableAssignment=*/false,
/*temporary_lhs=*/true);
};
- if (lhs.isAllocatable()) {
+
+ bool isBoxAllocatable = dst.match(
+ [](const fir::MutableBoxValue &box) { return box.isAllocatable(); },
+ [](const auto &box) { return false; });
+
+ if (isBoxAllocatable) {
// Deep copy allocatable if it is allocated.
// Note that when allocated, the RHS is already allocated with the LHS
// shape for copy on entry in createHostAssociateVarClone.
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 717b8cc0276a30..bee533949c6564 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -50,6 +50,7 @@ void DataSharingProcessor::processStep2(mlir::Operation *op, bool isLoop) {
}
void DataSharingProcessor::insertDeallocs() {
+ // TODO Extend delayed privatization to include a `dealloc` region?
for (const Fortran::semantics::Symbol *sym : privatizedSymbols)
if (Fortran::semantics::IsAllocatable(sym->GetUltimate())) {
converter.createHostAssociateVarCloneDealloc(*sym);
@@ -376,6 +377,7 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
symLoc, uniquePrivatizerName, symType,
isFirstPrivate ? mlir::omp::DataSharingClauseType::FirstPrivate
: mlir::omp::DataSharingClauseType::Private);
+ fir::ExtendedValue symExV = converter.getSymbolExtendedValue(*sym);
symTable->pushScope();
@@ -386,7 +388,7 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
&allocRegion, /*insertPt=*/{}, symType, symLoc);
firOpBuilder.setInsertionPointToEnd(allocEntryBlock);
- symTable->addSymbol(*sym, allocRegion.getArgument(0));
+ symTable->addSymbol(*sym, symExV.clone(allocRegion.getArgument(0)));
symTable->pushScope();
cloneSymbol(sym);
firOpBuilder.create<mlir::omp::YieldOp>(
@@ -403,10 +405,10 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
mlir::Block *copyEntryBlock = firOpBuilder.createBlock(
©Region, /*insertPt=*/{}, {symType, symType}, {symLoc, symLoc});
firOpBuilder.setInsertionPointToEnd(copyEntryBlock);
- symTable->addSymbol(*sym, copyRegion.getArgument(0),
+ symTable->addSymbol(*sym, symExV.clone(copyRegion.getArgument(0)),
/*force=*/true);
symTable->pushScope();
- symTable->addSymbol(*sym, copyRegion.getArgument(1));
+ symTable->addSymbol(*sym, symExV.clone(copyRegion.getArgument(1)));
auto ip = firOpBuilder.saveInsertionPoint();
copyFirstPrivateSymbol(sym, &ip);
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
new file mode 100644
index 00000000000000..545221b4c1abfe
--- /dev/null
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
@@ -0,0 +1,33 @@
+! Test delayed privatization for allocatables: `firstprivate`.
+
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 | FileCheck %s
+
+subroutine delayed_privatization_allocatable
+ implicit none
+ integer, allocatable :: var1
+
+!$omp parallel firstprivate(var1)
+ var1 = 10
+!$omp end parallel
+end subroutine
+
+! CHECK-LABEL: omp.private {type = firstprivate}
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK: } copy {
+! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK-NEXT: %[[PRIV_BASE_VAL:.*]] = fir.load %[[PRIV_PRIV_ARG]]
+! CHECK-NEXT: %[[PRIV_BASE_BOX:.*]] = fir.box_addr %[[PRIV_BASE_VAL]]
+! CHECK-NEXT: %[[PRIV_BASE_ADDR:.*]] = fir.convert %[[PRIV_BASE_BOX]]
+! CHECK-NEXT: %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT: %[[COPY_COND:.*]] = arith.cmpi ne, %[[PRIV_BASE_ADDR]], %[[C0]] : i64
+
+! CHECK-NEXT: fir.if %[[COPY_COND]] {
+! CHECK-NEXT: %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
+! CHECK-NEXT: %[[ORIG_BASE_ADDR:.*]] = fir.box_addr %[[ORIG_BASE_VAL]]
+! CHECK-NEXT: %[[ORIG_BASE_LD:.*]] = fir.load %[[ORIG_BASE_ADDR]]
+! CHECK-NEXT: hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_BASE_BOX]] temporary_lhs
+! CHECK-NEXT: }
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
new file mode 100644
index 00000000000000..83fb516879ff20
--- /dev/null
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
@@ -0,0 +1,40 @@
+! Test delayed privatization for allocatables: `private`.
+
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 | FileCheck %s
+
+subroutine delayed_privatization_allocatable
+ implicit none
+ integer, allocatable :: var1
+
+!$omp parallel private(var1)
+ var1 = 10
+!$omp end parallel
+end subroutine
+
+! CHECK-LABEL: omp.private {type = private}
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK-NEXT: %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_allocatableEvar1"}
+
+! CHECK-NEXT: %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT: %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK-NEXT: %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
+! CHECK-NEXT: %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT: %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+
+! CHECK-NEXT: fir.if %[[ALLOC_COND]] {
+! CHECK-NEXT: %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, uniq_name = "_QFdelayed_privatization_allocatableEvar1.alloc"}
+! CHECK-NEXT: %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT: fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT: } else {
+! CHECK-NEXT: %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK-NEXT: %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT: fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT: }
+
+! CHECK-NEXT: %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
+! CHECK-NEXT: omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+
+! CHECK-NEXT: }
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
a6180f0
to
2a1cb94
Compare
Thanks @ergawy for the change. And sorry for the changes you had to make due to the previous usage of both base and original base. I was expecting the deallocation of the allocatable variable to be inside a separate region in the PrivateClause Op.
|
No problem at all. I learned quite a bit trying to figure out the solution so the time spent did not go to waste for sure.
That should be the case indeed. I added a |
! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc { | ||
|
||
! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is allocated is not visible in the test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that deferred to the private test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, instead of just copying the check lines from the private test again, I skipped most of it here. Let me know if you prefer to repeat the checks here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create follow up PR(s) for this if you don't mind.
That is fine.
Would pointers come in a separate patch? |
else | ||
copyVarFIR(loc, sym, lhs_sb, rhs_sb); | ||
} | ||
|
||
void copyVarHLFIR(mlir::Location loc, mlir::Value dst, mlir::Value src) { | ||
void copyVarHLFIR(mlir::Location loc, Fortran::lower::SymbolBox dst, | ||
Fortran::lower::SymbolBox src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to change the argument here and the way allocatable were tested for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With delayed privatization, the address that the dst
box is not anymore a FortranVariableOpInterface
value. It is rather a block argument to the copy
region of the delayed privatizer:
omp.private {type = firstprivate} @_QF... : !fir.ref<!fir.box<!fir.heap<i32>>> alloc {
^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
...
} copy {
^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>, %arg1: !fir.ref<!fir.box<!fir.heap<i32>>>):
...
}
%arg1
in the above snippet is the dst
value. So we don't have access to the fortran_attrs
attributes attached to the original value anymore. However, the SymbolBox
still has that information since it directly checks the type of the address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I just realized that for eager privatization, we also have to check for the FortranVariableOpInterface
of the SymbolBox
variant. I didn't catch the issue since all tests were green so I will add more tests for eager privatization for both allocatables and pointers in a follow-up PR. Seems like this was an oversight, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit.
2a1cb94
to
e6a7bd8
Compare
Edit: never mind, found the fix and it is pretty simple so I will include it in this PR after adding some tests.
@kiranchandramohan, I just tried delayed privatization with pointers and there is a difference between the copy logic in eager privatization vs. the delayed case: Eager privatization:
Delayed privatization:
As far as I understand, So this will need proper handling which I think makes sense to defer to a follow-up PR. |
e6a7bd8
to
db72b10
Compare
Done in latest commit. |
I think this is now ready for another round of review :). I am working on supporting delayed privatization for array at the moment 👨🏭 . |
Ping 🔔! Can you please have another look? I have #85023 lined up for array support and also working on supporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG.
543022f
to
6c557cf
Compare
b0976ad
to
7816049
Compare
…ables and pointers One more step in extending support for delayed privatization. This diff adds support for scalar allocatables and pointers.
7816049
to
d50d39b
Compare
One more step in extending support for delayed privatization. This diff adds support for scalar allocatables and pointers.