Use same GPU stream for all kernels #296

MrBurmark · 2023-01-20T17:06:04Z

Use the same GPU stream for all kernels

Use a specific GPU stream for all Cuda/Hip kernels. This is done by using the same resource for all kernels. By default this is the RAJA default stream, but can be changed to stream 0 with the --gpu_stream_0 command line argument.

This PR is a feature
It does the following:
- Adds GPU stream control as requested in Use non-default stream with gpu kernels #294

MrBurmark · 2023-01-20T17:11:10Z

One thing I didn't think too deeply about while making this PR is what to do with calls to cudaMemcpy (non-async). I believe those API calls run on stream 0 which isn't a correctness problem as the RAJA default stream is synchronous with stream 0, but it isn't really in the spirit of the change in this PR either. I'm thinking about adding a resource argument to some of our memory helper functions to work around this.

rhornung67 · 2023-01-20T17:30:12Z

@MrBurmark the mem copy calls are outside of the kernel timing regions, so does it matter?

By default the default resource from camp is used. This uses a different stream by default.

MrBurmark · 2023-01-20T18:50:51Z

Most of them don't matter, but there are some that are used in the timed loop in reduction kernels. I've been trying it out and it looks like there is a performance penalty for using implicitly synchronized streams compared to a single stream. I'm going to rewrite those memory calls to explicitly call cuda|hipMemcpyAsync and streamSynchronize.

MrBurmark · 2023-01-20T22:02:05Z

I made the change for the kernels using memcpy and synchronize calls in the timed portion. They now use memcpyAsync+streamSynchronize or streamSynchronize to use their stream.

src/algorithm/MEMCPY-Cuda.cpp

rhornung67

A lot of changes for something we should have thought about earlier, huh? 😄 Thank you for working through this.

src/algorithm/MEMCPY-Cuda.cpp

…f into feature/burmark1/gpu_stream

…pu_stream

artv3

Wonderful!

rhornung67 · 2023-07-24T20:17:27Z

@MrBurmark let's merge this one next when it gets through tioga CI.

Also, we need to make sure that all new kernels are following this pattern.

MrBurmark requested review from artv3, rhornung67, CRobeck and rchen20 January 20, 2023 17:06

MrBurmark added 13 commits January 20, 2023 09:44

Add option to use cuda and hip stream 0

e9be19d

By default the default resource from camp is used. This uses a different stream by default.

Use resources in cuda algorithm kernels

de6043c

Use resources in hip algorithm kernels

1e11d67

Use resources in cuda apps kernels

e01a128

Use resources in hip apps kernels

c549fbd

Use resources in cuda basic kernels

ddab019

Use resources in hip basic kernels

2ffae43

Use resources in cuda stream kernels

d52d244

Use resources in hip stream kernels

88631e5

Use resource in cuda polybench kernels

d3c9c11

Use resources in hip polybench kernels

7eb205c

Use resources in cuda lcals kernels

2dcad2f

Use resources in hip lcals kernels

120c5f2

MrBurmark force-pushed the feature/burmark1/gpu_stream branch from 4f84161 to 120c5f2 Compare January 20, 2023 17:44

MrBurmark added 2 commits January 20, 2023 11:36

Use stream in timed cuda memcoy and sync calls

c7a02a3

Use stream in timed hip memcpy and sync calls

d99bb1a

rhornung67 reviewed Jan 20, 2023

View reviewed changes

src/algorithm/MEMCPY-Cuda.cpp Outdated Show resolved Hide resolved

rhornung67 approved these changes Jan 20, 2023

View reviewed changes

Move get GPU resource calls into KernelBase

c44990e

MrBurmark mentioned this pull request Jan 23, 2023

[WIP] Add unified memory space HIP variants #297

Closed

artv3 reviewed Jan 23, 2023

View reviewed changes

src/algorithm/MEMCPY-Cuda.cpp Outdated Show resolved Hide resolved

Name shmem arg in Cuda lcals kernels

561df19

MrBurmark and others added 8 commits January 23, 2023 10:44

Name shmem arg in polybench Cuda kernels

36bc2c4

Name shmem arg in polybench hip kernels

80a0c22

Merge branch 'develop' into feature/burmark1/gpu_stream

a5218e4

Merge branch 'feature/burmark1/gpu_stream' of github.com:LLNL/RAJAPer…

62e3eba

…f into feature/burmark1/gpu_stream

Name shmem arg in Stream kernels

53162fd

Name shmem arg in Algorithm kernels

bd1a360

Name shmem arg in Apps kernels

5bb4ae6

Name shmem arg in basic kernels

f667e93

MrBurmark requested review from artv3 and rhornung67 July 7, 2023 20:10

Merge remote-tracking branch 'origin/develop' into feature/burmark1/g…

5517fc0

…pu_stream

MrBurmark force-pushed the feature/burmark1/gpu_stream branch from 46fd529 to 5517fc0 Compare July 10, 2023 18:12

rhornung67 mentioned this pull request Jul 10, 2023

v2023.06.0 Release #344

Closed

24 tasks

MrBurmark added 3 commits July 10, 2023 13:56

Remove unused Cuda/Hip DeviceData functions

4e8cc81

Use stream in INDEXLIST cuda/hipMemcpy

2469e71

Use stream in cuda/hip FIR

39b748f

MrBurmark force-pushed the feature/burmark1/gpu_stream branch from 89fb9f5 to 39b748f Compare July 10, 2023 22:00

CRobeck approved these changes Jul 16, 2023

View reviewed changes

MrBurmark added 3 commits July 21, 2023 09:42

Merge remote-tracking branch 'origin/develop' into feature/burmark1/g…

58feef9

…pu_stream

Use resource in new kernels

60cd0a8

Use right resources with list segments

962bac4

MrBurmark force-pushed the feature/burmark1/gpu_stream branch from fc2ce08 to 962bac4 Compare July 21, 2023 16:59

artv3 approved these changes Jul 21, 2023

View reviewed changes

Merge branch 'develop' into feature/burmark1/gpu_stream

8c34081

rhornung67 approved these changes Jul 24, 2023

View reviewed changes

MrBurmark enabled auto-merge July 24, 2023 20:19

MrBurmark merged commit f713c27 into develop Jul 24, 2023

MrBurmark deleted the feature/burmark1/gpu_stream branch July 24, 2023 23:18

MrBurmark mentioned this pull request Dec 4, 2023

Implement tunings with the same name the same way between Base and RAJA variants #397

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use same GPU stream for all kernels #296

Use same GPU stream for all kernels #296

MrBurmark commented Jan 20, 2023

MrBurmark commented Jan 20, 2023

rhornung67 commented Jan 20, 2023

MrBurmark commented Jan 20, 2023 •

edited

Loading

MrBurmark commented Jan 20, 2023

rhornung67 left a comment

artv3 left a comment

rhornung67 commented Jul 24, 2023 •

edited

Loading

Use same GPU stream for all kernels #296

Use same GPU stream for all kernels #296

Conversation

MrBurmark commented Jan 20, 2023

Use the same GPU stream for all kernels

MrBurmark commented Jan 20, 2023

rhornung67 commented Jan 20, 2023

MrBurmark commented Jan 20, 2023 • edited Loading

MrBurmark commented Jan 20, 2023

rhornung67 left a comment

Choose a reason for hiding this comment

artv3 left a comment

Choose a reason for hiding this comment

rhornung67 commented Jul 24, 2023 • edited Loading

MrBurmark commented Jan 20, 2023 •

edited

Loading

rhornung67 commented Jul 24, 2023 •

edited

Loading