Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device op cuda #46

Open
wants to merge 76 commits into
base: main
Choose a base branch
from
Open

Device op cuda #46

wants to merge 76 commits into from

Conversation

devreal
Copy link

@devreal devreal commented Nov 6, 2023

Trial PR for feedback from @bosilca. Probably needs some more cleanup but feedback on the design is appreciated. Commits will be squashed later.

Currently only implements offloading for allreduce algorithms. It's missing rooted reduce.

related to spack/spack#40725

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Copy link

github-actions bot commented Nov 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
Copy link

github-actions bot commented Nov 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@devreal devreal requested a review from bosilca November 6, 2023 00:39
Copy link

github-actions bot commented Nov 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

…0_dlopen

spack:fix for dlopen missing symbol problem
Copy link

github-actions bot commented Nov 7, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

2 similar comments
Copy link

github-actions bot commented Nov 7, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Copy link

github-actions bot commented Nov 7, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

ef8b526: ROCM: add missing FUNC_FUNC_FN macro

  • check_signed_off: does not contain a valid Signed-off-by line

6fd216f: accelerator/rocm: regular memory behaves like unif...

  • check_signed_off: does not contain a valid Signed-off-by line

955849b: Device op: pass device to lower-level op to avoid ...

  • check_signed_off: does not contain a valid Signed-off-by line

3afec6b: Draft of ompi_op_select_device

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Joseph Schuchart and others added 21 commits November 7, 2023 18:09
Signed-off-by: Joseph Schuchart <jschuchart@leconte.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
If the target process is unable to execute an RDMA operation it
instructs the origin to change the communication protocol. When this
happen theorigin must be informed to cancel all pending RDMA operations,
and release the rdma_frag.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
…or allreduce recursive doubling

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
The accelerator component may report the availability of a single accelerator
whose ID is not zero.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
…_SUPPORT

These macros are defined to either 1 or 0

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
…evices

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
We know where source and target buffers are located, so pass the right
transfer direction to the accelerator memcpy call.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants