Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAJA view performance test to benchmark #1728

Merged
merged 7 commits into from
Sep 27, 2024
Merged

Conversation

artv3
Copy link
Member

@artv3 artv3 commented Sep 2, 2024

After resolving issue #1718, this PR now adds the performance test into the bench mark folder.

//------------

This PR adds the code provided in issue #1718 in an effort to reproduce the slow down.
I don't have access to pascal but on lassen I see comparable performance:

Elapsed time with RAJA view : 0.0951086
Elapsed time with NO RAJA view : 0.0952884

To avoid measuring stream initialization I added an basic forall at the start of the program.

Compiler setup:
nvcc: nvcc11.2.0, cuda_arch=70, gcc8.3.1


I also tried: nvcc11.8.0, cuda_arch=70, gcc8.3.1
Elapsed time with RAJA view : 0.0949394
Elapsed time with NO RAJA view : 0.0949237

@artv3
Copy link
Member Author

artv3 commented Sep 3, 2024

@artv3 move to benchmark folder.

@rahulb1218
Copy link

I ran the code on Pascal and got the following results:

Elapsed time with RAJA view : 16.6409
Elapsed time with NO RAJA view : 2.26529

@MrBurmark
Copy link
Member

What gpus are on pascal? It seems strange that they are ~24x slower than the V100s on lassen?

@rahulb1218
Copy link

Tesla P100-PCIE-16GB I believe.

@MrBurmark
Copy link
Member

Are we running the same code in the same way on both of these platforms? How did you build?

@rahulb1218
Copy link

We found that the issue was that my code built for debugging which was causing the slowdown.

@artv3 artv3 changed the title Reproducer for issue: 1718 Add RAJA view performance test to benchmark Sep 17, 2024
@artv3 artv3 marked this pull request as ready for review September 17, 2024 20:25
Copy link
Contributor

@johnbowen42 johnbowen42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some small nits about making sure the benchmark makes sense for all backends

benchmark/raja_view_blur.cpp Outdated Show resolved Hide resolved
benchmark/CMakeLists.txt Outdated Show resolved Hide resolved
@artv3
Copy link
Member Author

artv3 commented Sep 27, 2024

@johnbowen42 @rhornung67 can I get another review? I just pushed up the changes I thought I had pushed up.

@artv3 artv3 merged commit 1ddae3d into develop Sep 27, 2024
16 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants