[CUDA] use advise ACCESSED_BY for all devs in `ur_context_handle_t` for managed memory by JackAKirk · Pull Request #1717 · oneapi-src/unified-runtime

JackAKirk · 2024-06-05T17:51:02Z

This change improves the performance of managed memory in the cuda backend for all devices tested, by resulting in better default on device caching of the managed memory. We have tested this using a range of use cases. The performance benefit increases with increased number of devices, but can still be considerable when using only a single device if the managed memory is accessed frequently.
There appears to be no drawbacks to doing this by default (no use case was found where this led to a drop in performance), and the observed behaviour with respect to copies to and from the host now matches to what we observe to happen in the cuda runtime api. If a device in the picontext does not ever access the shared memory this use case also does not lead to a drop in performance even though we mark all devices in the picontext as ACCESSED_BY for this managed memory.

When allocating new managed memory. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

frasercrmck · 2024-06-06T08:48:54Z

source/adapters/cuda/usm.cpp

    ScopedContext Active(Device);
    UR_CHECK_ERROR(cuMemAllocManaged((CUdeviceptr *)ResultPtr, Size,
                                     CU_MEM_ATTACH_GLOBAL));
+    for (const auto &device : Context->getDevices()) {


nit: I don't claim to know anything about the coding style used here, but we have Context, Device, Size, and Err in this function, and we're introducing lower-case device here. Is it not inconsistent, and isn't it confusing to have two uses of "device" in the same function?

Agreed. I think in general good to go for hDevice for the device arg and then Dev for the local device. But I don't think the device arg is actually needed? Maybe better to do:

ScopedContext Active(Context->getDevices()[0]);

And just remove the Device arg altogether

Agreed. I think in general good to go for hDevice for the device arg and then Dev for the local device. But I don't think the device arg is actually needed? Maybe better to do:

ScopedContext Active(Context->getDevices()[0]);

And just remove the Device arg altogether

OK I'll switch the naming. But I don't think any device param can be removed.

What is happening here is that a stream within CUcontext of the Device that was associated with a runtime (sycl) queue is allocating managed memory. Probably in many cases we could ignore this prescription provided by the programmer and just use the first device from the Context->getDevices()[0] and this wouldn't matter to a user. However I think it could do.
Say for example an application from User A is on a node but sees all devices in the platform on that node (in perlmutter there are 4 gpus). If the Context corresponds to a sycl default context then all four gpus will be in the context. The user provides device 1 as the device whos allocated CUcontext they prescribe to allocate the managed memory via. User A only wants to use device 0. We mark accessed_by for all devices but this isn't used - not a big deal, it is cheap
User B is using device 0 for a different application on the same node. I imagine that if user B is saturating the gpu, it might be a bad idea for us to break the prescription from user A and use device 0 instead.
Probably it doesn't make a big difference in practice, but hopefully the above details there is a potential difference. If the interface used provides a specific device, then I think this should be used as the "command device"

In that case we should also be calling setAccessedBy for the hDevice param, or asserting that hDevice is within the context

OK I'll add an assert.

Actually I still think it would be better to not do asserts etc, and just leave it as it is. Since I might have to loop over 8 gpus, and in the future this number will probably increase. I think that if people don't use the default context with all devices, then it can be their responsibility to check they are actually using one of the devices they said they wanted in the context. If they don't then the behaviour will just be how it was before, and it is unlikely that many users will encounter this situation.

However I'll do whatever you prefer? I just find it hard to pick between the options, and if it is me picking, I choose how I did it now.

I've just ended up calling the advise also via the commandDevice. This will lead to a duplication in normal circumstances, but I've checked you can advise twice without an error.

Yeah that sounds good to me.

Covers case where command device outside the ur Context. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

source/adapters/cuda/usm.cpp

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2024-06-19T14:21:37Z

Closing in favour of #1774

Mark accessedBy for all devs in picontext

d49cc0f

When allocating new managed memory. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested a review from a team as a code owner June 5, 2024 17:51

JackAKirk requested a review from npmiller June 5, 2024 17:51

github-actions bot added the cuda CUDA adapter specific issues label Jun 5, 2024

JackAKirk changed the title ~~[CUDA] use advise ACCESSED_BY for all devs in picontext for managed memory~~ [CUDA] use advise ACCESSED_BY for all devs in URcontext for managed memory Jun 5, 2024

JackAKirk changed the title ~~[CUDA] use advise ACCESSED_BY for all devs in URcontext for managed memory~~ [CUDA] use advise ACCESSED_BY for all devs in ur_context_handle_t for managed memory Jun 5, 2024

frasercrmck reviewed Jun 6, 2024

View reviewed changes

Add duplicate advise on command device.

76c9653

Covers case where command device outside the ur Context. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

GeorgeWeb reviewed Jun 7, 2024

View reviewed changes

source/adapters/cuda/usm.cpp Show resolved Hide resolved

JackAKirk added 2 commits June 7, 2024 07:54

fix format.

11bd103

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Fix format; attempt 2.

e18cf57

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested a review from hdelan June 10, 2024 11:29

JackAKirk mentioned this pull request Jun 10, 2024

[DO NOT MERGE] CI test intel/llvm#14112

Closed

hdelan approved these changes Jun 10, 2024

View reviewed changes

JackAKirk added the ready to merge Added to PR's which are ready to merge label Jun 10, 2024

JackAKirk added 4 commits June 11, 2024 15:47

Merge branch 'main' into cuda-malloc-managed-optimize

901e54a

Merge branch 'main' into cuda-malloc-managed-optimize

a883dcd

Merge branch 'main' into cuda-malloc-managed-optimize

f71f08a

Merge branch 'main' into cuda-malloc-managed-optimize

5b5eacf

hdelan mentioned this pull request Jun 19, 2024

[UR][HIP][CUDA] Batched Changes #1774

Merged

JackAKirk closed this Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[CUDA] use advise ACCESSED_BY for all devs in `ur_context_handle_t` for managed memory#1717

[CUDA] use advise ACCESSED_BY for all devs in `ur_context_handle_t` for managed memory#1717
JackAKirk wants to merge 8 commits intooneapi-src:mainfrom
JackAKirk:cuda-malloc-managed-optimize

JackAKirk commented Jun 5, 2024

Uh oh!

frasercrmck Jun 6, 2024

Uh oh!

hdelan Jun 6, 2024

Uh oh!

JackAKirk Jun 6, 2024

Uh oh!

hdelan Jun 6, 2024 •

edited

Loading

Uh oh!

JackAKirk Jun 6, 2024

Uh oh!

JackAKirk Jun 6, 2024

Uh oh!

JackAKirk Jun 7, 2024

Uh oh!

hdelan Jun 10, 2024

Uh oh!

Uh oh!

JackAKirk commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

JackAKirk commented Jun 5, 2024

Uh oh!

frasercrmck Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

JackAKirk Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

JackAKirk Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

JackAKirk Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JackAKirk commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hdelan Jun 6, 2024 •

edited

Loading