-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hide visibility of non-public symbols #1644
Hide visibility of non-public symbols #1644
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, this is good! If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?
Good question! I just tried removing the Then tried removing ALL of the Maybe this is because the Python package only calls into the DSOs through public symbols generated by Cython, and those are not affected by this CMake change? I'm going to try pushing a commit with those removals here, let's see what other CI jobs say. |
8935ec3
to
e9482bf
Compare
My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke. |
This reverts commit e9482bf.
Right, totally understand and I agree. Unfortunately it looks like even removing those, everything passed: https://github.com/rapidsai/rmm/actions/runs/10326031645?pr=1644. So I just pushed 1c39756 reverting this PR back to its original state. Something about CI will have to change here if we want feedback on the exports working correctly within rmm's CI. This is way at the edge of my understanding, so the only thing I can think of is to dump the symbol table for every DSO and pass it through some text processing (e.g. we shouldn't see any WEAK / GLOBAL I'm certain someone else will have a better idea than that for how to express the behavior we want in CI here. cc @robertmaynard @KyleFromNVIDIA |
Can you please open an RMM issue for this? Thanks. |
As for testing, could add a separate test that links spdlog and RMM and make sure it compiles without symbol conflicts? |
That works, so instead of changing the default behavior, we'd add an option
We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient? Basic library inspection tools won't even give us enough information to study the stability of the objects since we can't check on things like member ordering/alignment or vtable contents. |
I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of
So without any change changes to Like this... foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
set_target_properties(
${_cython_target}
PROPERTIES
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
endforeach() ... after each call to Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding |
Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report. |
If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to |
I just tested the ref: rapidsai/cudf#15483 (comment) I've also proposed a new testing script implementing the idea from discussion above to try to add a CI-time check on symbol visibility. It shows up like this in wheel build logs:
Tested that locally and found that it did successfully fail when I omitted the visibility attributes from some targets. I've hard-coded I think this is ready for another review. |
@jameslamb Could you open a rapids-cmake issue to track this feature request? I happy with using the current |
Sure! here you go: rapidsai/rapids-cmake#672
Thanks! I agree. |
When we created build-planning we (including you and @jarmak-nv) specifically discussed this use case as one where the fragmentation associated with creating per-repo issues wasn't worth the cost. Without a centralized point of discussion (e.g. a build-planning issue) we end up with a comment on a cugraph issue pointing to a solution discussed in a combination of posts on a cuml issue and a raft PR (not an exaggeration, this kind of cross-repo discussion is fairly common without a centralized place for discussion). If an rmm issue is a must-have for you, could we compromise on keeping those issues as near-empty issues that just link to a build-planning issue? I realize that creating the issues on other repos isn't hard, especially with rapids-reviser, but that doesn't address the IMO much larger concern of fragmentation. |
I'm fine with doing it this way for now, but I think that in the long run we should upstream the functionality. The purpose of
Hmm yeah I thought briefly about the vtables but didn't get far enough as to consider how they would be affected by crossing ABI boundaries. How is this made safe in general? Are vtables always embedded as unique symbols so that they can be deduplicated across DSOs? What about class methods? If rmm symbols corresponding to its classes are made public, and if one of those classes has some virtual function, could you end up with the same function defined in multiple DSOs (say libcuml and libcuvs) such that the two end up causing runtime lookup conflicts due to the symbols being identical? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small suggestions, but nothing blocking. Address as you see fit. I discussed some things with Robert offline and am happy with the current state, recognizing that we may have to make more changes in the future.
@harrism could you please review one more time? I significantly changed the approach after your first approval, don't want you to feel like I'm sneaking something in here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. I like that it's not touching the C++ code.
Ok thanks for the reviews! I'm going to merge this so we can continue with rapidsai/cudf#15483.
|
/merge |
Description
Fixes #1645
Contributes to rapidsai/build-planning#33
Similar to rapidsai/cudf#15982
Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the
rmm
Python library, via the following:-fvisibility=hidden
by defaultrmm
(everything in thermm::
namespace) with__attribute__((visibility("default")))
Benefits of this change
Reduces the risk of symbol conflicts when
rmm
is used alongside other libraries. For example, see this case incudf
where thespdlog::
symbols inrmm
are conflicting with thespdlog::
symbols innvcomp
: rapidsai/cudf#15483 (comment)Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs.
Notes for Reviewers
This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬
Checklist
How I tested this
Re-used the wheels produced from this PR over in
cudf
and sawcudf
's tests succeed where previously the had failed because of symbol conflicts: rapidsai/cudf#15483 (comment)Also tested this locally, read on...
On an x86_64 machine with CUDA 12.2, checked out the latest
24.10
ofrmm
and the branch ofcudf
from rapidsai/cudf#15483 (which is up to date with the latest24.10
ofcudf
).Built
librmm
,rmm
,libcudf
, andcudf
wheels from source, in the same CUDA 12, Python 3.11 image used for wheel building CI.how I did that (click me)
Created a script,
build-all.sh
, with the following.Ran it like this:
Installed the wheels produced by that script, using one of the testing images used across RAPIDS CI.
how I did that (click me)
Saw that fail with exactly the same symbol-conflict issue discussed in rapidsai/cudf#15483 (comment)..
Repeated that process with the changes you see in this PR, and observed:
rmm
's Python tests passingcudf
😁Also inspected the symbol tables with
readelf
. For example, here are the counts of symbols by visibility inmemory_resource.cpython-311-x86_64-linux-gnu.so
:It also helped reduce the sizes a bit
how I did that (click me)
Unzipped the
rmm
wheel.rm -rf ./delete-me mkdir -p ./delete-me unzip -d ./delete-me ./dist/rmm_cu12*.whl
Checked the symbol tables with
readelf
Checked the sizes with
pydistcheck
Where did
RMM_EXPORT
come from?It's been in
rmm
for a few years:rmm/include/rmm/detail/export.hpp
Lines 19 to 26 in 975c911
But only used in one place, from #833.
References
I found these helpful: