-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared libraries with transitions aren't always available where expected in runtime #13819
Comments
Here's a draft: quval@2e63f5a. Doesn't create the default |
@oquenchil: sorry about the ping – sort of a general question, though: is it be better to just send my patch as a PR and iterate on it, or to discuss the general approach on the ticket before sending a PR? |
Hi @quval , let's discuss first what you plan to do. |
@oquenchil: Thank you! The gist of the problem is that sometimes libraries will be looked for under the default solib dir even if the libraries themselves have transitions (this is the case e.g. in remote execution), but unless there's some non-transitioned library linked in, the default solib dir isn't added to the search path. I'm proposing to add an rpath entry to default solib dir last, as a fallback, if it hasn't been added before. When running under the execroot, all the libraries will be found in earlier rpath entries; otherwise, all the libraries will be found in that last rpath entry. This in itself solves the issue, but while thinking about this, I've noticed a potential problem: say our binary is linking libA.so with no transitions and libB.so with transitions; the default solib will be added first to the rpath so that libA.so can be found, but if libB.so is also present there, say because it was compiled without transitions for some other target, it will be erroneously picked up from the default solib dir. Essentially, the problem is that libraries built with configuration transitions still have the same name. To fix this, my patch also adds the transition mnemonic to the mangled name of the output library. This can only happen if running under the execroot; in remote execution, only the correct version of the library will be present. One last thing is that the default solib directory should always exist for rpath lookups to work. On a clean execroot, when building a binary that only depends on libraries with transitions, all rpath lookups will fail: rpaths are relative to the default solib directory and can't work if it doesn't exist. This will only rarely be an actual problem. For completeness, I'd be happy to send a patch for this too, only I haven't been able to figure out how to do this. I'd appreciate any insights. In the meantime, I've put together a draft implementation (for the first two parts) at quval@2e63f5a. |
@oquenchil: friendly ping :) |
I wonder if the fix should actually be to do the same equality comparison here: https://source.corp.google.com/piper///depot/google3/third_party/bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/LibrariesToLinkCollector.java;rcl=325437765;l=270 but extracting the configuration fragment from both values before comparing, which would then trigger line 222. I didn't understand why you added the root last while line 222 does it at the beginning. |
@oquenchil: thanks! I've actually been debating about the same, but if I remember correctly, I ultimately decided on this approach because of rpath search order – if we first look in the root, we may either pick up the wrong library (unless we mangle in the configuration fragment, which I think I added later) or waste time looking up the SO in a directory that's potentially large (I wanted to preserve the current behaviour given the comment on line 220, though as far as I understand, this isn't an issue when the code is deployed, only when running under the execroot). If you think this is the right way to do this, I'll revise the patch and send as a PR. Do you know of a way to ensure the root solib dir exists even if all output libraries are in configured solib dirs? My guess is that the root almost always exists, but rpath lookup won't work. I'd be happy to get this fixed along the way, but wasn't able to figure out how. |
So I don't know the answer to your last question. This is an area of the code that I haven't touched that much. I will try to find some time next week to run it under the debugger and gain a better idea. |
Thank you! In the meantime, I've just opened a PR for the first part (#14011). |
When an executable has dynamically-linked dependencies that have transitions, shared libraries are looked up under rpaths like this: ``` $ORIGIN/(../)*_solib_k8/../../../k8-fastbuild-ST-[0-9a-f]+/bin/_solib_k8 ``` Unless running under the execroot, the transitioned solib directory may not be available at all; the right files will be present under the default solib directory, so it should be on the rpath. Adding the default solib directory to the rpath may cause the wrong version of a shared library to be loaded, e.g. if it has been compiled in the default configuration. To prevent this, we also add the transition mnemonic to the mangled name of the library (adding the default solib last to the rpath won't really help, because it can legitimately be added first). #13819 Closes #14011. PiperOrigin-RevId: 401521899
PR bazelbuild#14011 took care of cc_libraries, this fixes the same issue for cc_imports. Work towards bazelbuild#13819.
PR bazelbuild#14011 took care of cc_libraries, this fixes the same issue for cc_imports. Work towards bazelbuild#13819.
This issue seems to affect |
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by "../". Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards bazelbuild#13819.
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by "../". Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards bazelbuild#13819.
@quval I also got hit by this issue and came up with the following attempt at a simple fix: #14660. Could you perhaps take a look? I don't think I would be able to spot the more subtle issues this change could cause. |
@fmeum: seems like a great way to work around the issue - thanks! I'm far from an expert myself, but normalising paths seems safe enough to me. |
PR bazelbuild#14011 took care of cc_libraries. This fixes the same issue for cc_imports. Work towards bazelbuild#13819. Closes bazelbuild#14272. PiperOrigin-RevId: 427137675 (cherry picked from commit 8734ccf)
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by "../". Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards bazelbuild#13819.
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`. Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards #13819. Closes #14660. PiperOrigin-RevId: 431671888
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`. Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards bazelbuild#13819. Closes bazelbuild#14660. PiperOrigin-RevId: 431671888
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`. Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards bazelbuild#13819. Closes bazelbuild#14660. PiperOrigin-RevId: 431671888
…14929) When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`. Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths. This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists. Work towards #13819. Closes #14660. PiperOrigin-RevId: 431671888
@quval Just trying to wrap my head around what works and what doesn't: Did our fixes, taken together, fully fix this issue or are you aware of any remaining problems? |
@fmeum I may be missing something (it has been a while), but I think the issue is pretty much resolved. |
I don't see anything else either. Given that a number of issues link to this one, it's probably better to mark it closed. @sgowroji Could you close this issue? |
Thank you for the above updates. Closing this issue now. Feel free to reach us if you still have any specific queries. |
Description of the problem:
When an executable has dynamically-linked dependencies that have transitions, shared libraries aren't always placed where the binary expects them in runtime. This means, for example, that such tests cannot be run in remote execution.
The rpath where shared libraries with transitions are looked up is
$ORIGIN/_solib_k8/../../../k8-fastbuild-ST-<...>/bin/_solib_k8
, but this path is not always available:bazel-out/k8-fastbuild/bin/_solib_k8/
may not always exist (e.g. if all shared libraries have transitions). ld doesn't seem to follow../
from nonexistent paths.k8-fastbuild-ST-<...>
isn't available at all. The right files exist under_solib_k8
, but because of the rpath, ld doesn't look for them there.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Run
bazel test :b
on this: https://gist.github.com/quval/5077b2edc902e9dc0bcad2311a245ffa(Or change
cc_test
tocc_binary
withlinkstatic = 0
and usebazel run
).What operating system are you running Bazel on?
Linux
What's the output of
bazel info release
?4.1.0 (also happens at master)
Have you found anything relevant by searching the web?
Issue #9357 seems to be related (though Bazel doesn't crash here).
Any other information, logs, or outputs that you want to share?
After a90c408 the same library may not be linked more than once, so a solution may be to always add the default
solib
directory to the rpath. (When running locally, the wrong version may then be loaded for libraries that have been built for more than one configuration – this is already a potential issue; the defaultsolib
may have to go last if added as a fallback – or the configuration could be encoded in the mangled name of the library to prevent this altogether.) I'll be happy to send a PR if any of this makes sense.The text was updated successfully, but these errors were encountered: