-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP][Offload] Update the Logic for Configuring Auto Zero-Copy #143638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list. The changes in this PR are to clearly reflect the design and logic of enabling the feature for improving the readibility.
@llvm/pr-subscribers-offload Author: None (Kewen12) ChangesSummary: Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list. The changes in this PR are to clearly reflect the design and logic of enabling the feature for further improving the readibility. Full diff: https://github.com/llvm/llvm-project/pull/143638.diff 1 Files Affected:
diff --git a/offload/libomptarget/PluginManager.cpp b/offload/libomptarget/PluginManager.cpp
index 93589960a426d..c4d99dfa9f10c 100644
--- a/offload/libomptarget/PluginManager.cpp
+++ b/offload/libomptarget/PluginManager.cpp
@@ -286,16 +286,16 @@ void PluginManager::registerLib(__tgt_bin_desc *Desc) {
}
PM->RTLsMtx.unlock();
- bool UseAutoZeroCopy = Plugins.size() > 0;
+ bool UseAutoZeroCopy = false;
auto ExclusiveDevicesAccessor = getExclusiveDevicesAccessor();
- for (const auto &Device : *ExclusiveDevicesAccessor)
- UseAutoZeroCopy &= Device->useAutoZeroCopy();
+ // APUs are homogeneous set of GPUs. Check the first device for
+ // configuring Auto Zero-Copy.
+ if (ExclusiveDevicesAccessor->size() > 0) {
+ auto &Device = *(*ExclusiveDevicesAccessor)[0];
+ UseAutoZeroCopy = Device.useAutoZeroCopy();
+ }
- // Auto Zero-Copy can only be currently triggered when the system is an
- // homogeneous APU architecture without attached discrete GPUs.
- // If all devices suggest to use it, change requirement flags to trigger
- // zero-copy behavior when mapping memory.
if (UseAutoZeroCopy)
addRequirements(OMPX_REQ_AUTO_ZERO_COPY);
|
UseAutoZeroCopy &= Device->useAutoZeroCopy(); | ||
// APUs are homogeneous set of GPUs. Check the first device for | ||
// configuring Auto Zero-Copy. | ||
if (ExclusiveDevicesAccessor->size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems weird, this is all devices right? What if someone had another GPU on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since an APU is a homogeneous set of GPUs, so another GPU should also be same kind by design. so we can enable zero-copy if the first device suggest to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is all devices. It could be an NVIDIA GPU on some external PCI-e thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with Kewen in that we do not have a case where one of the devices is an APU and the others could be something else. You could build it yourself perhaps, but it doesn't exist today as a product. If time comes when there will be such a case, we can revisit this. Reminder: MI300A is a very special case where each socket GPU devices appear as a single GPU and all sockets are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate this scenario? IIUC, GPU on PCI-e is dGPU?
// configuring Auto Zero-Copy. | ||
if (ExclusiveDevicesAccessor->size() > 0) { | ||
auto &Device = *(*ExclusiveDevicesAccessor)[0]; | ||
UseAutoZeroCopy = Device.useAutoZeroCopy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check for APU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plug-in determines whether this is a case of auto zero copy and in case you f the amdgpu plugin it does so by verifying that the node is an APU. We do not abstract the concept of APU, which is vendor specific, to the top level libomptarget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think useAutoZeroCopy
is sufficient to return the correct info we need here (even if it is not an apu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this work.
// configuring Auto Zero-Copy. | ||
if (ExclusiveDevicesAccessor->size() > 0) { | ||
auto &Device = *(*ExclusiveDevicesAccessor)[0]; | ||
UseAutoZeroCopy = Device.useAutoZeroCopy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you have a system that has both AMD and GPU cards on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what mentioned in Carlo's reply. We'd be happy to revisit in the future if needed. Also, this change would work better for our future changes on device initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd be happy to revisit in the future if needed.
This doesn't sound like a good idea TBH but it is probably fine, since this piece of code will only affect AMDGPU's plugin if I understand correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not have a case where one of the devices is an APU and the others could be something else. You could build it yourself perhaps, but it doesn't exist today as a product. If time comes when there will be such a case, we can revisit this.
We could have a system with mix of GPUs in theory, but we didn't see that in product in real life. Perhaps we will see a produce of that design in the future :) We could make update accordingly when it happens. Thanks for the comments!
…vm#143638) Summary: Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list. The changes in this PR are to clearly reflect the design and logic of enabling the feature for further improving the readibility.
Summary:
Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list.
The changes in this PR are to clearly reflect the design and logic of enabling the feature for further improving the readibility.