-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error relocating dotnet - missing symbol _ZNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEC1Ev #1723
Comments
Seems like a mismatch between the C runtime we build against vs what we run against? |
We are building Linux MUSL arm64 on Alpine 3.9, this test was running on 3.8. It used to work fine when I was testing it in the past, but maybe we've updated the testing docker images since and a standard C++ library update in 3.9 that we've pulled into the new image has broken the backwards compatibility. |
There was a problem in building rootfs for Alpine. Specifying the edge/main and edge/testing repos for all the packages lead to an unintended installation of libstdc++.so.6.0.27. The libstdc++.so.6.0.25 from 3.9/main should be installed instead, otherwise the binaries we build have unresolved dependency to I've created a PR in arcade to fix that (dotnet/arcade#4657). Once that gets merged and propagated to https://github.com/dotnet/dotnet-buildtools-prereqs-docker, we can regenerate the docker images there and update our lab scripts to use the new image name for building Alpine stuff. That will fix the problem. |
@janvorli it seems like the way we build rootfs in dotnet-buildtools-prereqs-docker, arcade doesn't need to flow as when we call Since we commented the helix-queues here and the way libraries do testing is not resilient to skipping the send to helix job when the queues are empty, all CI jobs are failing with an error due to the queues being empty. Can this be fixed already or should I put up a PR to disable the test run for musl arm64? |
I've realized that too. There was a recent change in the https://github.com/dotnet/dotnet-buildtools-prereqs-docker repo (4 days ago), so there should be new images will be generated due to that and we can just use them. |
@janvorli - the builds automatically run when a PR is merged. The entire set of images do not get rebuilt, only the Dockerfiles affected by the PR are built. Unfortunately there was an infrastructure issue that impacted updating the versions repo with the latest images. If you tell me which image you are looking for I can look up the latest version and let you know. |
@MichaelSimons the image we need to get is the new version of mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20191023143847. We need the rootfs in it to be rebuilt based on a recent change in the Arcade repo. As there was no change in the related Dockerfile, based on what you've said above I am not sure if it got built. |
@janvorli - the most recent image is |
@MichaelSimons I've checked the image you've mentioned and it doesn't have the necessary change in the rootfs. Could you please trigger a build of a newer one? |
The build completed. The newest tag is now |
@MichaelSimons awesome, I have verified that the crossfs now contains the necessary change. |
Yes, I'll work on that and spin a new CI build to see if we still hit those build errors. |
I have a build going with the change here: https://dev.azure.com/dnceng/public/_build/results?buildId=498569 |
It is strange, the failure is still there and I can see that coreclr build has used the right docker image. The failure is still there but I can see that coreclr build has used the right docker image now. I've downloaded the I can also see there is another problem. The installer build has failed at the packaging time and I can see that it is due to a bug in our scripts. The problem is caused by the fact that the native components are built into a directory that doesn't contain "musl" in its subpath (the subfolder is linux-arm64 and not linux-musl-arm64). See this from the native build in the log:
This location is expected, the subpath has never been linux-musl-arm64.Release. As you can see, the packager tries to get it from
|
We get the host here: I just put up a PR to update the version to use the latest produced host from dotnet/runtime. I will update that in the branch where I update the container and queue another build. |
I queued: https://dev.azure.com/dnceng/public/_build/results?buildId=499531 Thanks for the investigation @janvorli. @dagood might know faster than I do what we should update for packaging to work on the installer side. |
This sounds like a failure for the Installer jobs to properly find the live bits, and getting it from packages instead. (Matching the rest of the comment.) @jkoritzinsky might know something about how this is happening silently since he wrote the live build override code.
I encountered something related where runtime/eng/pipelines/installer/jobs/base-job.yml Lines 181 to 185 in 23c54f1
It seems plausible that it's being used somewhere it shouldn't be (or vice versa) but I don't see why it would suddenly start failing when building the pkgproj... |
On a recent official build, the corehost ended up where it's expected fine:
|
The build args are identical (except build version), but the first few lines of the native build are different: - Official
+ CI build above
- __DistroRid: linux-musl-arm64
+ __DistroRid: linux-arm64
- __RuntimeId: linux-musl-arm64
+ __RuntimeId: linux-arm64 It looks like RID detection has gone bad due to changes in the container... |
I don't think it is related to the live build. @janvorli meant the dotnet that we use to run libraries tests, which is always acquired from a NuGet package in runtime.depproj. We have an issue to move libraries tests to use the live built host. |
The new container probably broke this detection, checking... runtime/eng/native/init-distro-rid.sh Lines 140 to 143 in 23c54f1
|
The command output changed: # Old
#> docker run --rm mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20191023143847 /crossrootfs/arm64/usr/bin/ldd --version
/crossrootfs/arm64/usr/bin/ldd: 2: exec: /lib/ld-musl-aarch64.so.1: not found
# New
#> docker run --rm mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20200127195039 /crossrootfs/arm64/usr/bin/ldd --version
standard_init_linux.go:211: exec user process caused "exec format error" We could add the old detection method back in, it still looks like it works: #1363 (comment) #> docker run --rm mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20191023143847 cat /crossrootfs/arm64/etc/os-release
...
ID=alpine
...
#> docker run --rm mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20200127195039 cat /crossrootfs/arm64/etc/os-release
...
ID=alpine
... |
Given this, I believe what we should do is, update the docker container image (keep tests disabled), fix the RID detection, then wait for an official build to produce a new host package, update the version to that one so that it contains @janvorli fixes in the rootfs folder and then run the tests using that host and see what happens. |
What I don't understand is that coreclr build for linux arm on the same build, using the same docker container, set the RID correctly:
@dagood how would that be any different? I just tried running the Init RID script that we use in a container using the new image and it prints the right output. Here is from my build using the same docker container locally:
|
The issue is the
I can't explain that. I'd suggest trying the commands in |
May have found a missing bit of context--this isn't quite right:
It has always been |
You’re right probably since I got confused enough from my local environment, I just wanted to read -musl-arm64 and my mind betrayed me, haha.
Yeah, I’ll keep digging a little bit. @am11 any ideas? This is the failing build: https://dev.azure.com/dnceng/public/_build/results?buildId=498569 |
Found something interesting. The
|
You are right, I was mistaken in this - it is that way for coreclr / libraries, but not for installer. |
That's the reason for the failure. The link is relative, so it ends up trying to execute the aarch64 binary in the rootfs (the crossrootfs/arm64/lib/ld-musl-aarch64.so.1). In the previous image, it was a shell script that tried to load /lib/ld-musl-aarch64.so.1 using absolute path, so it was looking for it out of the rootfs and didn't find it. |
|
It looks like I don't know exactly what versions are involved here... dotnet/arcade#4657 does seem to have been a downgrade of some things, but not sure if that's directly related. |
It was a fix - we were accidentally pulling in some stuff from the edge/main and edge/testing repositories like the latest libstdc++ into the Alpine rootfs. So that obviously pulled in the new way of ldd as a script. |
@dagood, you are right, just realized that interestingly, i am unable to reproduce this issue: docker run -it mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-alpine-406629a-20200127195039 bash -c /crossrootfs/arm64/usr/bin/ldd --version
musl libc (aarch64)
Version 1.1.20
Dynamic Program Loader
Usage: /crossrootfs/arm64/usr/bin/ldd [options] [--] pathname |
I am unable to repro it myself locally too. But I believe the reason is that I have qemu installed that can run the aarch64 code on my x64 Ubuntu. This is from my x64 machine:
|
Ah, tested it in a fresh VM and was able to reproduce the issue. added a fallback condition: #2336. |
I ran a new build with the dotnet host that was produced with this new container and the issue is fixed 🎉. The only test that failed now was a true failure:
I'll close once I merge my PR. |
Platform: Linux_musl arm64 release
Pipeline: runtime-libraries outerloop, runtime-libraries outerloop-linux
Example run: https://dev.azure.com/dnceng/public/_build/results?buildId=482386&view=logs&j=6c926a84-4f53-5790-1d4c-92b88465ec72&t=bc0eac9e-2cde-5709-c96e-4455fef1ffb5
Proximate diagnostic info:
The text was updated successfully, but these errors were encountered: