-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vulkan][Refactor] Move ownership of per-CPU-thread objects to VulkanDeviceAPI #8196
Conversation
Potential reviewers: @masahi @tmoreau89 |
Thanks @Lunderberg , it would still be useful to think about the overhead. While thread local is a different way of organizing things, the direct TLS lookup have a lower overhead than a lock based cache lookup. Assuming we only have the singleton case it might still be desirable to use TLS when possible(i understand that might restrict us to singleton usecases) Another minor thing is we usually keep the non-public facing apis in src/ so they are not visible to the downstream deps. This would help reduce the possible future API compact problem |
Thank you @tqchen , and that makes sense with the potential overhead issues. I had been assuming that the main performance cost would be in the I'll see if I can measure the performance difference, and will also see whether I can refactor to maintain the ownership model of everything deriving from the Regarding the non-public apis, that makes sense. It wasn't vulkan-specific, and so I didn't want it all the way in |
That sounds good. Thanks @Lunderberg , i agree likely the hit could be negligible considering the launching overhead |
…y to VulkanDevice - Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup. - Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated all uses.
…onstructor/destructor. - VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction. - VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction.
…lkanDevice - Previously, was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class.
…kanDevice - Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class.
…lkanDeviceAPI - Previously, the WorkspacePool was owned by VulkanThreadEntry, and required a lookup from VulkanDeviceAPI::AllocWorkspace. As a result, non-global VulkanDeviceAPI would interact with each other.
…VulkanDeviceAPI - Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI. - Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI.
551edf4
to
5b224b5
Compare
Running some performance tests, it looks like the refactor has very little impact on the overall runtime. The plots below show the Q1/median/Q3 runtimes for different low-level tasks that would need to access the thread-specific resources. The only significant difference is for the copying data to the device, which is slightly higher for very small buffer copies. Benchmarking details: Used |
@Lunderberg @tqchen Is this good to go? |
The performance tests are matched pre/post change for cases that access the thread-specific resources, so I think it's good to go. @tqchen Any follow-up concerns? |
Thanks @Lunderberg, surprisingly after this change, the segfault on my APU that was happening on process exit is somehow gone! |
Ooh, that's a nice side-effect to get! I guess that confirms that the segfault was caused by out-of-order destruction, though I'm still not quite sure how that happened, given that we checked the order of construction. |
…DeviceAPI (apache#8196) * [Vulkan][Refactor] Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice - Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup. - Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated all uses. * [Vulkan][Refactor] Pulled VulkanBuffer allocation/deallocation into constructor/destructor. - VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction. - VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction. * [Vulkan][Refactor] Move the VulkanStagingBuffer to be owned by the VulkanDevice - Previously, was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Move ownership of per-thread uniform buffer to VulkanDevice - Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Moved ownership of per-thread workspace pool to VulkanDeviceAPI - Previously, the WorkspacePool was owned by VulkanThreadEntry, and required a lookup from VulkanDeviceAPI::AllocWorkspace. As a result, non-global VulkanDeviceAPI would interact with each other. * [Vulkan][Refactor] Moved ownership of per-thread active device id to VulkanDeviceAPI - Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI. - Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
…DeviceAPI (apache#8196) * [Vulkan][Refactor] Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice - Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup. - Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated all uses. * [Vulkan][Refactor] Pulled VulkanBuffer allocation/deallocation into constructor/destructor. - VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction. - VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction. * [Vulkan][Refactor] Move the VulkanStagingBuffer to be owned by the VulkanDevice - Previously, was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Move ownership of per-thread uniform buffer to VulkanDevice - Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Moved ownership of per-thread workspace pool to VulkanDeviceAPI - Previously, the WorkspacePool was owned by VulkanThreadEntry, and required a lookup from VulkanDeviceAPI::AllocWorkspace. As a result, non-global VulkanDeviceAPI would interact with each other. * [Vulkan][Refactor] Moved ownership of per-thread active device id to VulkanDeviceAPI - Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI. - Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Continuing refactoring from #8188. The goal of this PR is to establish clear ownership of vulkan resources. Previously, resources were owned both by VulkanDeviceAPI and by VulkanThreadEntry, depending on whether it is a per-CPU-thread resource. This PR introduces a
ThreadMap<T>
class that behaves similarly tothread_local
, but can be used for non-static data members, then usesThreadMap<T>
to move all ownership of vulkan resources to be relative to theVulkanDeviceAPI
. Individual steps are in separate commits for readability.Moved ownership of per-thread active device id to VulkanDeviceAPI
Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI.
Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI.
Moved ownership of per-thread workspace pool to VulkanDeviceAPI
Move ownership of per-thread uniform buffer to VulkanDevice
Move the VulkanStagingBuffer to be owned by the VulkanDevice
Pulled VulkanBuffer allocation/deallocation into constructor/destructor.
VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction.
VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction.
Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice
Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup.
Added ThreadMap as a member to VulkanDevice, updated all uses.