2727## Usage
2828
2929In order to enable ` do concurrent ` to OpenMP mapping, ` flang ` adds a new
30- compiler flag: ` -fdo-concurrent-to-openmp ` . This flags has 3 possible values:
30+ compiler flag: ` -fdo-concurrent-to-openmp ` . This flag has 3 possible values:
31311 . ` host ` : this maps ` do concurent ` loops to run in parallel on the host CPU.
3232 This maps such loops to the equivalent of ` omp parallel do ` .
33- 2 . ` device ` : this maps ` do concurent ` loops to run in parallel on a device
34- (GPU). This maps such loops to the equivalent of `omp target teams
35- distribute parallel do`.
36- 3 . ` none ` : this disables ` do concurrent ` mapping altogether. In such case, such
33+ 2 . ` device ` : this maps ` do concurent ` loops to run in parallel on a target device.
34+ This maps such loops to the equivalent of
35+ ` omp target teams distribute parallel do` .
36+ 3 . ` none ` : this disables ` do concurrent ` mapping altogether. In that case, such
3737 loops are emitted as sequential loops.
3838
39- The above compiler switch is currently avaialble only when OpenMP is also
39+ The above compiler switch is currently available only when OpenMP is also
4040enabled. So you need to provide the following options to flang in order to
4141enable it:
4242```
@@ -54,13 +54,13 @@ that:
5454To describe current status in more detail, following is a description of how
5555the pass currently behaves for single-range loops and then for multi-range
5656loops. The following sub-sections describe the status of the downstream
57- implementation on the AMD's ROCm fork( * ) . We are working on upstreaming the
57+ implementation on the AMD's ROCm fork[ ^ 1 ] . We are working on upstreaming the
5858downstream implementation gradually and this document will be updated to reflect
5959such upstreaming process. Example LIT tests referenced below might also be only
6060be available in the ROCm fork and will upstream with the relevant parts of the
6161code.
6262
63- ( * ) https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
63+ [ ^ 1 ] : https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
6464
6565### Single-range loops
6666
@@ -211,8 +211,8 @@ loops and map them as "collapsed" loops in OpenMP.
211211
212212Loop-nest detection is currently limited to the scenario described in the previous
213213section. However, this is quite limited and can be extended in the future to cover
214- more cases. For example, for the following loop nest, even thought , both loops are
215- perfectly nested; at the moment, only the outer loop is parallized :
214+ more cases. For example, for the following loop nest, even though , both loops are
215+ perfectly nested; at the moment, only the outer loop is parallelized :
216216``` fortran
217217do concurrent(i=1:n)
218218 do concurrent(j=1:m)
@@ -221,9 +221,9 @@ do concurrent(i=1:n)
221221end do
222222```
223223
224- Similary for the following loop nest, even though the intervening statement ` x = 41 `
225- does not have any memory effects that would affect parallization , this nest is
226- not parallized as well (only the outer loop is).
224+ Similarly, for the following loop nest, even though the intervening statement ` x = 41 `
225+ does not have any memory effects that would affect parallelization , this nest is
226+ not parallelized as well (only the outer loop is).
227227
228228``` fortran
229229do concurrent(i=1:n)
@@ -244,7 +244,7 @@ of what is and is not detected as a perfect loop nest.
244244
245245### Data environment
246246
247- By default, variables that are used inside a ` do concurernt ` loop nest are
247+ By default, variables that are used inside a ` do concurrent ` loop nest are
248248either treated as ` shared ` in case of mapping to ` host ` , or mapped into the
249249` target ` region using a ` map ` clause in case of mapping to ` device ` . The only
250250exceptions to this are:
@@ -253,20 +253,20 @@ exceptions to this are:
253253 examples above.
254254 1 . any values that are from allocations outside the loop nest and used
255255 exclusively inside of it. In such cases, a local privatized
256- value is created in the OpenMP region to prevent multiple teams of threads
257- from accessing and destroying the same memory block which causes runtime
256+ copy is created in the OpenMP region to prevent multiple teams of threads
257+ from accessing and destroying the same memory block, which causes runtime
258258 issues. For an example of such cases, see
259259 ` flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 ` .
260260
261- Implicit mapping detection (for mapping to the GPU ) is still quite limited and
262- work to make it smarter is underway for both OpenMP in general and ` do concurrent `
263- mapping.
261+ Implicit mapping detection (for mapping to the target device ) is still quite
262+ limited and work to make it smarter is underway for both OpenMP in general
263+ and ` do concurrent ` mapping.
264264
265265#### Non-perfectly-nested loops' IVs
266266
267267For non-perfectly-nested loops, the IVs are still treated as ` shared ` or
268268` map ` entries as pointed out above. This ** might not** be consistent with what
269- the Fortran specficiation tells us. In particular, taking the following
269+ the Fortran specification tells us. In particular, taking the following
270270snippets from the spec (version 2023) into account:
271271
272272> § 3.35
@@ -277,9 +277,9 @@ snippets from the spec (version 2023) into account:
277277> § 19.4
278278> ------
279279> A variable that appears as an index-name in a FORALL or DO CONCURRENT
280- > construct, or ... is a construct entity. A variable that has LOCAL or
280+ > construct [ ...] is a construct entity. A variable that has LOCAL or
281281> LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
282- > ...
282+ > [ ...]
283283> The name of a variable that appears as an index-name in a DO CONCURRENT
284284> construct, FORALL statement, or FORALL construct has a scope of the statement
285285> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
@@ -288,7 +288,7 @@ snippets from the spec (version 2023) into account:
288288From the above quotes, it seems there is an equivalence between the IV of a `do
289289concurrent` loop and a variable with a ` LOCAL` locality specifier (equivalent
290290to OpenMP's ` private ` clause). Which means that we should probably
291- localize/privatize a ` do concurernt ` loop's IV even if it is not perfectly
291+ localize/privatize a ` do concurrent ` loop's IV even if it is not perfectly
292292nested in the nest we are parallelizing. For now, however, we ** do not** do
293293that as pointed out previously. In the near future, we propose a middle-ground
294294solution (see the Next steps section for more details).
@@ -327,8 +327,8 @@ At the moment, the FIR dialect does not have a way to model locality specifiers
327327on the IR level. Instead, something similar to early/eager privatization in OpenMP
328328is done for the locality specifiers in ` fir.do_loop ` ops. Having locality specifier
329329modelled in a way similar to delayed privatization (i.e. the ` omp.private ` op) and
330- reductions (i.e. the ` omp.delcare_reduction ` op) can make mapping ` do concurrent `
331- to OpenMP (and other parallization models) much easier.
330+ reductions (i.e. the ` omp.declare_reduction ` op) can make mapping ` do concurrent `
331+ to OpenMP (and other parallel programming models) much easier.
332332
333333Therefore, one way to approach this problem is to extract the TableGen records
334334for relevant OpenMP clauses in a shared dialect for "data environment management"
@@ -345,7 +345,7 @@ logic of loop nests needs to be implemented.
345345### Data-dependence analysis
346346
347347Right now, we map loop nests without analysing whether such mapping is safe to
348- do or not. We probalby need to at least warn the use of unsafe loop nests due
348+ do or not. We probably need to at least warn the use of unsafe loop nests due
349349to loop-carried dependencies.
350350
351351### Non-rectangular loop nests
@@ -362,7 +362,7 @@ end do
362362We defer this to the (hopefully) near future when we get the conversion in a
363363good share for the samples/projects at hand.
364364
365- ### Generalizing the pass to other parallization models
365+ ### Generalizing the pass to other parallel programming models
366366
367367Once we have a stable and capable ` do concurrent ` to OpenMP mapping, we can take
368368this in a more generalized direction and allow the pass to target other models;
0 commit comments