-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed macOS system libcxxabi? #148
Changed macOS system libcxxabi? #148
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
…nda-forge-pinning 2024.05.25.13.27.35
@conda-forge/libcxx @conda-forge/core, this is an interesting situation. It seems that the macOS system libcxxabi changed ABI somewhere between 10.9 and 10.13, which is manifesting as segfaults in the boost test. In more detail: #144 is not ready (at all), but made me look at libcxx-testing due to failures. As it turns out, a mere rerender there (conda-forge/libcxx-testing-feedstock#8) causes the feedstock to fail, where the only real change is moving the deployment target from 10.9 to 10.13. So the immediate question becomes: how could #131 have passed then (which was obviously already using 10.13 at the time, and runs libcxx-testing as a downstream test)? I don't yet have an answer. Where it gets interesting is that - on a wild hunch - I undid the patch here where @isuruf brought some struct in alignment with the old system libcxxabi (AFAIU), and lo and behold, the segfaults are gone (see last and second-to-last runs here). I think I've mostly understood why we need to rely on the system libcxxabi, but it seems strange to me that the ABI there seems to have changed? I mean, not that strange, since AFAIU macOS eventually mirrors what LLVM does, but still: it would appear that we cannot actually build against something that would work for both 10.9 & 10.13. Presumably, something about this must already have been broken (and it's just surfacing now), and I have no idea how often similar breaks might have happened in the libcxxabi's shipped by newer deployment targets. Would appreciate your thoughts. |
It's very odd we haven't seen reports of segfaults from this in the wild. Also, given the name of the library, an ABI break here is ironic to say the least. I still don't fully understand why the packages I build need this in the first place. It could be time to push on that more. |
libcxx depends on symbols from libcxxabi, but AFAICT we cannot ship our own libcxxabi, because some packages (Python?) always prefer the system one. Here's the intro from pybind11_exception_rtti_test:
However, we might be stretching what can be done with the system ABI to the breaking point. Maybe we need to go back to shipping libcxxabi, and patching python to ensure we prefer our libcxxabi...? |
Yeah I wrote that readme file IIRC. 😀 Yes, ideally we'd bake in to our binaries to load our libcxxabi first. It would need to be any binary, not just python. A compile-time equivalent to ld preload if you will. I know of no such option. :/ |
I think it's possible that it matters which libcxxabi other relevant packages in that test (python, pybind11, etc.) have been built against. Currently, python and pybind11 have not been rebuilt against 10.13 yet, so if this hypothesis is true, it has to be something else.
Could we patch out the default lookup being done in |
Idk but that would have to be done case by case. The issue iirc is that when projects link against the few libs we do not ship on osx, it is those libs that pull in the system libcxxabi. Idk how, eg Python itself, would link directly. |
Isn't that just a question of what's on the (default or project-specific) library search path that the linker uses to go look for symbols? |
Btw this is why i think it is all binaries. Nearly everything will link against the system libs. If it is the system libs that pull in libcxxabi, then there is not a lot we can do about it besides some sort of ld preload type adjustment to the linking. |
I dont follow the question. IIRC, the system libcxxabi is pulled in first despite the presence of other rpath entries in the binary. We can't exclude the system linking paths since we need the system libs for some things so we're stuck. The only fix we found was to ld preload the conda libcxxabi. Thus we stopped using the conda one on osx since the system one appeared to be pulled in first by the system libs no matter what. The two are apparently not always interchangeable and we can't adjust the system libs, so we're stuck. |
The tests in the testing feedstock used to fail if we used the conda libcxxabi. You should be able to reproduce by adding that back locally and running the builds. Maybe you can find a way to fix things that is better. |
Otoh this is a bit of a red herring. Even if we used the conda one all the time, apparently there is an Abi break upstream and that would apply to the conda libs too, right? |
yup, it's the same problem of (apparently) no long-term stable ABI that we have with the system libcxxabi. When Isuru introduced that patch, upstream said (more context; separated by mailing-list-to-discourse transition):
Isuru had said:
That has worked well for libcxx, but not sure if this is feasible w.r.t. relying on system libcxxabi.
I tried going back to shipping our own libcxxabi in #149, and the downstream tests pass without any further changes (in contrast to this PR). However, they fail when we disable patch 0002 (which passes here). So it really seems like an either-or choice. Again, there maybe other packages currently compiled against the "wrong" libcxxabi already in place, but it seems we can't have our cake and eat it too. |
Like we discussed in the core call, publishing a rebased version of this PR to (separate to keep context/discussion of this PR) |
This reverts commit ee7bbb8.
So I had one hypothesis that this could have had something to do with the ABI that python itself was built against. But 3.12.4 was built against 10.13 and the tests here (with the patch to match the old ABI) still fail. Could you share your thoughts @isuruf please - as you authored that patch, I assume you understand much better what's going on. |
You are running into the same issue as #77 (comment). |
Thank you, that clearly points as the macOS 12 image bump as the culprit. The longer explanation there then goes into detail of how pybind11 might end up using a cached system libcxx, but AFAIU, enabling that was exactly the use-case for you adding that as a test to conda-forge/libcxx-testing-feedstock#1, and patching (d84a6e1) the ABI here. If we wanted to match the system ABI today, a "fix" would be to just undo that ABI-patch, but I was asking for your input (same at the second-to-last core meeting) whether there's a way to fix this more sustainably. AFAIU we're a bit between a rock and a hard place, as in: the system ABI will keep changing subtly, so we can either keep building for one ABI (the one in the CI images...? since that's what we test against here) and accept the failing corner cases, or figure out a way how to force loading our own libcxx over the system one. I was hoping that there's a better way that I'm not seeing. |
We can build two variants. One with (__osx <12) and one without the patch (__osx >=12) |
I agree that this is an obvious solution for this case. I was worried how many such ABI breaks are still lurking in newer macOS versions that we're blind to (and whether we'd then end up building for each major macOS version). But I guess we can consider these cases as rare and treat them on a case-by-case basis... Would you be alright with not testing the |
Hi! This is the friendly automated conda-forge-webservice.
I've started rerendering the recipe as instructed in #147.
If I find any needed changes to the recipe, I'll push them to this PR shortly. Thank you for waiting!
Here's a checklist to do before merging.
Fixes #147