Skip to content

[OpenMP][Offload] Update the Logic for Configuring Auto Zero-Copy #143638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2025

Conversation

Kewen12
Copy link
Contributor

@Kewen12 Kewen12 commented Jun 11, 2025

Summary:

Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of enabling the feature for further improving the readibility.

Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of enabling the feature for improving the readibility.
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-offload

Author: None (Kewen12)

Changes

Summary:

Currently the Auto Zero-Copy is enabled by checking every initialized device to ensure that no dGPU is attached to an APU. However, an APU is designed to comprise a homogeneous set of GPUs, therefore, it should be sufficient to check any device for configuring Auto Zero-Copy. In this PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of enabling the feature for further improving the readibility.


Full diff: https://github.com/llvm/llvm-project/pull/143638.diff

1 Files Affected:

  • (modified) offload/libomptarget/PluginManager.cpp (+7-7)
diff --git a/offload/libomptarget/PluginManager.cpp b/offload/libomptarget/PluginManager.cpp
index 93589960a426d..c4d99dfa9f10c 100644
--- a/offload/libomptarget/PluginManager.cpp
+++ b/offload/libomptarget/PluginManager.cpp
@@ -286,16 +286,16 @@ void PluginManager::registerLib(__tgt_bin_desc *Desc) {
   }
   PM->RTLsMtx.unlock();
 
-  bool UseAutoZeroCopy = Plugins.size() > 0;
+  bool UseAutoZeroCopy = false;
 
   auto ExclusiveDevicesAccessor = getExclusiveDevicesAccessor();
-  for (const auto &Device : *ExclusiveDevicesAccessor)
-    UseAutoZeroCopy &= Device->useAutoZeroCopy();
+  // APUs are homogeneous set of GPUs. Check the first device for
+  // configuring Auto Zero-Copy.
+  if (ExclusiveDevicesAccessor->size() > 0) {
+    auto &Device = *(*ExclusiveDevicesAccessor)[0];
+    UseAutoZeroCopy = Device.useAutoZeroCopy();
+  }
 
-  // Auto Zero-Copy can only be currently triggered when the system is an
-  // homogeneous APU architecture without attached discrete GPUs.
-  // If all devices suggest to use it, change requirement flags to trigger
-  // zero-copy behavior when mapping memory.
   if (UseAutoZeroCopy)
     addRequirements(OMPX_REQ_AUTO_ZERO_COPY);
 

UseAutoZeroCopy &= Device->useAutoZeroCopy();
// APUs are homogeneous set of GPUs. Check the first device for
// configuring Auto Zero-Copy.
if (ExclusiveDevicesAccessor->size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird, this is all devices right? What if someone had another GPU on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since an APU is a homogeneous set of GPUs, so another GPU should also be same kind by design. so we can enable zero-copy if the first device suggest to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is all devices. It could be an NVIDIA GPU on some external PCI-e thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Kewen in that we do not have a case where one of the devices is an APU and the others could be something else. You could build it yourself perhaps, but it doesn't exist today as a product. If time comes when there will be such a case, we can revisit this. Reminder: MI300A is a very special case where each socket GPU devices appear as a single GPU and all sockets are identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate this scenario? IIUC, GPU on PCI-e is dGPU?

// configuring Auto Zero-Copy.
if (ExclusiveDevicesAccessor->size() > 0) {
auto &Device = *(*ExclusiveDevicesAccessor)[0];
UseAutoZeroCopy = Device.useAutoZeroCopy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to check for APU?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plug-in determines whether this is a case of auto zero copy and in case you f the amdgpu plugin it does so by verifying that the node is an APU. We do not abstract the concept of APU, which is vendor specific, to the top level libomptarget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think useAutoZeroCopy is sufficient to return the correct info we need here (even if it is not an apu).

Copy link
Member

@carlobertolli carlobertolli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this work.

// configuring Auto Zero-Copy.
if (ExclusiveDevicesAccessor->size() > 0) {
auto &Device = *(*ExclusiveDevicesAccessor)[0];
UseAutoZeroCopy = Device.useAutoZeroCopy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have a system that has both AMD and GPU cards on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what mentioned in Carlo's reply. We'd be happy to revisit in the future if needed. Also, this change would work better for our future changes on device initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be happy to revisit in the future if needed.

This doesn't sound like a good idea TBH but it is probably fine, since this piece of code will only affect AMDGPU's plugin if I understand correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have a case where one of the devices is an APU and the others could be something else. You could build it yourself perhaps, but it doesn't exist today as a product. If time comes when there will be such a case, we can revisit this.

We could have a system with mix of GPUs in theory, but we didn't see that in product in real life. Perhaps we will see a produce of that design in the future :) We could make update accordingly when it happens. Thanks for the comments!

@ronlieb ronlieb merged commit bbe59e1 into llvm:main Jun 11, 2025
9 checks passed
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…vm#143638)

Summary:

Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants