Skip to content

Conversation

@skc7
Copy link

@skc7 skc7 commented Oct 31, 2025

This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host.

This pass implements following rewrites and optimisations:

  • FissionWorkdistribute: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region.
  • WorkdistributeRuntimeCallLower: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently.
  • WorkdistributeDoLower: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}.
  • TeamsWorkdistributeToSingle: hoists all the ops inside teams {workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation:

flang_workdistribute_iwomp_2024

Paper related to this work by @ivanradanov "Automatic Parallelization and OpenMP Offloadingof Fortran Array
Notation"

This PR introduces a new pass "lower-workdistribute"
Fortran array statements are lowered to fir as fir.do_loop unordered.
"lower-workdistribute" pass works mainly on identifying "fir.do_loop
unordered" that is nested in target{teams{workdistribute{fir.do_loop
unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops
outside target region. Relaces heap allocation on target with
omp.target_allocmem and deallocation with omp.target_freemem from host.
Also replaces runtime function "Assign" with omp.target_memcpy from
host.

This pass implements following rewrites and optimisations:

- **FissionWorkdistribute**: finds the parallelizable ops within teams
{workdistribute} region and moves them to their own
teams{workdistribute} region.
- **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls
nested in teams {workdistribute{}} and lowers it to unordered do loop if
src is scalar and dest is array. Other runtime calls are not handled
currently.
- **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in
teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams
{parallel { distribute {wsloop {loop_nest}}}}.
- **TeamsWorkdistributeToSingle**: hoists all the ops inside teams
{workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from
coexecute implementation:

[flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024)

Paper related to this work by @ivanradanov ["Automatic Parallelization
and OpenMP Offloadingof Fortran Array
Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants