Skip to content

Conversation

@skc7
Copy link

@skc7 skc7 commented Oct 31, 2025

This PR includes all the commmits that are required for lowering workdistribute flang feature.
https://ontrack-internal.amd.com/browse/SWDEV-531975

PR1 #450
PR2 #451
PR3 #452
PR4 #453
PR5 #454

skc7 added 5 commits October 31, 2025 15:29
…llvm#145464)

This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.

Example:
  %1 = omp.target_allocmem %device : i32, i64
  omp.target_freemem %device, %1 : i32, i64

The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](ivanradanov@be860ac)
[Lower omp_target_{alloc,free} to
llvm](ivanradanov@6e2d584)
…lvm#154376)

This PR adds workdistribute mlir op in omp dialect and also in llvm
frontend.

The work in this PR is c-p and updated from @ivanradanov commits from coexecute implementation:
flang_workdistribute_iwomp_2024
This PR adds workdistribute parser and semantic support in flang.

The work in this PR is c-p and updated from @ivanradanov commits from coexecute implementation:
flang_workdistribute_iwomp_2024
This PR adds lowering of workdistribute construct in flang to omp mlir dialect workdistribute op.

The work in this PR is c-p and updated from @ivanradanov commits from coexecute implementation:
flang_workdistribute_iwomp_2024
This PR introduces a new pass "lower-workdistribute"
Fortran array statements are lowered to fir as fir.do_loop unordered.
"lower-workdistribute" pass works mainly on identifying "fir.do_loop
unordered" that is nested in target{teams{workdistribute{fir.do_loop
unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops
outside target region. Relaces heap allocation on target with
omp.target_allocmem and deallocation with omp.target_freemem from host.
Also replaces runtime function "Assign" with omp.target_memcpy from
host.

This pass implements following rewrites and optimisations:

- **FissionWorkdistribute**: finds the parallelizable ops within teams
{workdistribute} region and moves them to their own
teams{workdistribute} region.
- **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls
nested in teams {workdistribute{}} and lowers it to unordered do loop if
src is scalar and dest is array. Other runtime calls are not handled
currently.
- **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in
teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams
{parallel { distribute {wsloop {loop_nest}}}}.
- **TeamsWorkdistributeToSingle**: hoists all the ops inside teams
{workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from
coexecute implementation:

[flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024)

Paper related to this work by @ivanradanov ["Automatic Parallelization
and OpenMP Offloadingof Fortran Array
Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
@z1-cciauto
Copy link
Collaborator

@skc7 skc7 marked this pull request as ready for review November 3, 2025 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants