-
Notifications
You must be signed in to change notification settings - Fork 758
[RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust::
namespace in a custom namespace
#1401
Comments
thrust::
namespace in a custom namespace
Updated the title mention the work that needs to be done. I think we can nicely solve this with these macro sets:
@nv-dlasalle Does this sound reasonable for your needs? |
@allisonvacanti This sounds like it would work perfectly for us. Thanks! |
PyTorch is experiencing the same issues pytorch/pytorch#54245. Probably CUB should avoid using static variable for cacheing in the template function |
Some context: Those caches were added a while back to avoid overhead from some expensive CUDA API calls. Users were seeing a significant impact from these calls under certain workloads, and the caches were necessary for good performance in some critical applications. I agree that using statics in a header is a fragile solution and, well, generally not a good idea. But we don't really have a lot of other options -- Thrust/CUB are header-only, so we can't place the cache in a library component. C++17 inline variables may provide a nicer workaround eventually, but we can't rely on them yet. For now, the namespace workaround will be the preferred solution, but I just wanted to share that we're aware of the issue and want to move to a more robust solution when one becomes available. |
Thanks! I found it should be fine using static variable inside a non-template non-inline function. In those cases, gcc won't compile these symbols as However, the UNIQUE symbol breaks the RTLD_LOCAL setting, that later library loaded won't instantiate its own static variable. This causes the conflict |
If each library indeed saw the same static variable, that would be fine - the values that are cached are supposed to be the same for all libraries. But in case of pytorch/pytorch#52663 and pytorch/pytorch#54245 a new static is allocated, but its constructor is not called, so there are 0 devices instead of correctly cached number of devices (same would happen for other cached attributes). |
Separate namespace could solve this issue, which is also the solution DGL team've adapted dmlc/dgl#2758. However, I still prefer to share my investigation here for people to understand the root cause and avoid similar issue in the future. It took us about one week to figure out the root cause. I believe the cause is the UNIQUE symbol, however I found there's limited resources explaining this. I'm not an expertise in C++ so my statement here could be wrong. Some solutions I found
Using template function with static variable will result in some unusual behavior. I hope my investigation here can help people find a better solution. Reference:
A simple gist for the UNIQUE symbol: dmlc/dgl#2758 |
#1464 and NVIDIA/cub#326 provide the new namespace customization hooks. With those applied, defining Alternatively, if I'm wrapping up testing and reviews, but these are passing initial tests. If anyone gets a chance to try these out and see if they fix their dynamic linking problems, please let me know. |
The fix for this has landed. Define For more info:
|
(Note that the anonymous namespace macros have been removed. These were interacting badly with nvcc, and will not be implemented in the forseeable future.) |
Problem
Cub allows itself to place into a namespace via
CUB_NS_PREFIX
andCUB_NS_POSTFIX
, such that multiple shared libraries can each utilize their own copy of it (and thus different versions can safely coexist). Static variables used for caching could otherwise cause problems (e.g., https://github.com/NVIDIA/cub/blob/main/cub/util_device.cuh#L212).Thrust however depends on cub and requires it to not be in another namespace, so users cannot have
CUB_NS_PREFIX
defined. This means if two libraries use two different versions of thrust (or cub), issues with the caching variables inside of cub can occur.Possible solutions
A solution would be to add
THRUST_NS_PREFIX
andTHRUST_NS_POSTFIX
to allow each library to place the version of thrust it's compiling against within in it's namespace, and either utilize the version of cub in the global namespace, or utilize the version of cub within the same namespace by definingCUB_NS_PREFIX
as well.Another solution, would be to allow users to define something like
THRUST_CUB_NS
, to tell thrust which namespace to look for cub in:The text was updated successfully, but these errors were encountered: