Add RAJA view performance test to benchmark #1728
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After resolving issue #1718, this PR now adds the performance test into the bench mark folder.
//------------
This PR adds the code provided in issue #1718 in an effort to reproduce the slow down.
I don't have access to pascal but on lassen I see comparable performance:
Elapsed time with RAJA view : 0.0951086
Elapsed time with NO RAJA view : 0.0952884
To avoid measuring stream initialization I added an basic forall at the start of the program.
Compiler setup:
nvcc: nvcc11.2.0, cuda_arch=70, gcc8.3.1
I also tried: nvcc11.8.0, cuda_arch=70, gcc8.3.1
Elapsed time with RAJA view : 0.0949394
Elapsed time with NO RAJA view : 0.0949237