Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiReducer #1665

Merged
merged 128 commits into from
Jul 12, 2024
Merged

Add MultiReducer #1665

merged 128 commits into from
Jul 12, 2024

Conversation

MrBurmark
Copy link
Member

@MrBurmark MrBurmark commented Jun 10, 2024

Add runtime sized reducer

Add a runtime sized reducer based on design mentioned in #1648.

  • This PR is a feature
  • It does the following:
    • Adds MultiReducer at the request of myself and others
  • TODO
    • testing for random bin per iterate
    • testing for all bins per iterate
    • testing for some bins per iterate
    • testing for cuda/hip that would go over available shmem
    • testing with forall
    • testing with kernel
    • testing with launch
    • cuda/hip tuning parameters
    • cuda/hip tuning
    • fallback for cuda/hip if shmem unavailable
    • Figure out nvcc compile issue
    • Figure out intel correctness problem with omp::Auto
    • omp_target implementation
    • sycl implementation
    • new reducer interface

@MrBurmark MrBurmark marked this pull request as draft June 10, 2024 23:24
@artv3
Copy link
Member

artv3 commented Jun 11, 2024

Adding an example could be really nice too! You can probably just take one from the unit test.

@MrBurmark
Copy link
Member Author

Here is the nvcc error output.

.../RAJA/test/functional/forall/multi-reduce-basic/tests/test-forall-basic-MultiReduce.hpp:94:394: error: '__T6' was not declared in this scope
     RAJA::forall<EXEC_POLICY>(seg, [=] RAJA_HOST_DEVICE(IDX_TYPE idx) {
                                                                                                                                                                                                                                                                                                                                                                                                          ^

Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks really good. I had some question comments about a few things and a number of suggestions to clarify documentation.

MrBurmark and others added 2 commits July 10, 2024 14:52
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
@rhornung67
Copy link
Member

@MrBurmark third time is the charm? 😄

@MrBurmark
Copy link
Member Author

MrBurmark commented Jul 11, 2024

@MrBurmark third time is the charm? 😄

*crosses fingers*

// using multi_reduce_policy = RAJA::cuda_multi_reduce_atomic;
// using multi_reduce_policy = RAJA::hip_multi_reduce_atomic;

Here a simple sum multi-reduction is performed using RAJA::
Copy link
Member

@rhornung67 rhornung67 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want that change unless you change: "Here a" to "Here is a", which is more verbose.

Co-authored-by: Robert Chen <chen59@llnl.gov>
@MrBurmark MrBurmark merged commit c1cffa9 into develop Jul 12, 2024
18 checks passed
@MrBurmark MrBurmark deleted the feature/burmark1/multireduce branch July 12, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants