-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker build hangs indefinitely after installing MSYS2 #59
Comments
thanks |
So I've done some more testing and if I blacklist msys2-runtime (via IgnorePkg in pacman.conf) then I'm able to get through the MSYS2 install and install other packages in subsequent layers, though obviously that's not really a viable workaround as it puts me in an unsupported state which I assume will cause other issues (and likely get worse as more time passes). Is there some sort of flag I can enable to get verbose diagnostic information related to msys2-runtime to try and help narrow down the source of the problem? |
Thanks for testing, yeah, looks like the latest msys2-runtime update broke something only in docker, which is why CI didn't catch it :( |
Thanks. At least now I know it's not just me. Let me know if there's anything I can do to help investigate. I'm not really familiar with debugging of MYS2 internals but happy to try and learn to lend a hand. |
Please try again now |
@lazka Wow that was fast. It appears to be working now. Thank you! Out of curiosity, what did you change? |
Thanks for testing.
I wrote up my findings here: https://cygwin.com/pipermail/cygwin/2022-December/252711.html |
@lazka It appears this has regressed again some time today. Any ideas? I see this on multiple machines again (both locally and my CI server) so I don't think it's a local issue. Was able to work around it with the same hack as last time (blocking update of msys2-runtime). |
Can you get the version of msys2-runtime package - which one fails and which one does not ? |
This is what I see when I block the update: So presumably 3.4.3-2 is the working one and 3.4.3-3 the broken one. |
By the way I'm currently pulling the nightly installer, but I just tried switching to the one tagged 2022-12-16 and the overall result appears to be the same. Though the previous version is different in this case (which seems reasonable/expected): Digging back through previous CI logs I see the failure was occurring when upgrading to 'msys2-runtime-3.4.2-2' though, and in the run where it started working it looks like no update occurred at all. Perhaps the issue being resolved was an incorrect conclusion originally because I was testing with the nightly installer, and since no update occurred after the attempted fix, that resolved the issue simply because the offending code was no longer being executed anymore (since there was no update), but now that there's an update it's back again. |
I see, yeah these might be two unrelated issues. The ASLR change made our nightly builds fail which in turn meant installing the latest build and updating resulted in a runtime update. |
Did some more testing yesterday, and this also reproduces with the latest pacman update for me. I checked the Github Actions here (for the Docker builds in msys2-installer) and the latest runs are passing but also doesn't appear to be triggering an update of msys2-runtime or pacman. I wonder if that's the reason it seems to work fine there too (since afaict it always builds the installer first and uploads it, so the Docker builders pull an installer that's up-to-date with the latest runtime/pacman). For what it's worth this is the test case I'm using (without the workarounds to IgnorePkg) which I ripped from your CI documentation to make sure there was nothing I was introducing in my own Dockerfile to cause the issue:
If I add the following after the first-run it starts working (and I can install/update other packages).
How I'm building:
Requires either a msys2-runtime or a pacman update to reproduce. Haven't noticed the same problem with other packages. E.g. I can add steps to install GCC or whatever - I tried many different packages - just fine as long as msys2-runtime/pacman are blacklisted, and updates to the pre-installed curl/libcurl also work so it's not an install vs update difference. Quite confused, but will keep trying to narrow it down. |
No update, but I can at least confirm the issue is reproducible in CI (this case installed a version leading to an update): https://github.com/msys2/msys2-installer/actions/runs/3953154465 |
Did anyone get any further on this one? Spent a few days hitting my head against a stuck Docker build after adding a Running without doing the msys2 system update via either the choco package or ridk allows me to get a built container at least. cinst ruby # install ruby
cinst msys2 --params "/NoUpdate" # install msys2 without system update
Update-SessionEnvironment # refresh environment vars
ridk install 3 There is still an unexplained 10 minute delay building the layer from docker which is probably an unrelated or semi-related Docker on Windows file-system issue, but at least it doesn't seem indefinitely stuck.
Upgrading cinst ruby # install ruby
cinst msys2 --params "/NoUpdate" # install msys2 without system update
Update-SessionEnvironment # refresh environment vars
C:\\tools\\msys64\\\usr\\bin\bash -c "echo '[options]' >> /etc/pacman.conf"
C:\\tools\\msys64\\\usr\\bin\bash -c "echo 'IgnorePkg = msys2-runtime' >> /etc/pacman.conf"
C:\\tools\\msys64\\\usr\\bin\bash -c "echo 'IgnorePkg = pacman' >> /etc/pacman.conf"
ridk install 2 3
|
There seems to be some discussion of similar Pacman hangs at git-for-windows/git-for-windows-automation#61 but no idea if it’s related to the issue here, or just sounds similar. |
I played around with this by copying the docker bits from the CI from this repository, and I saw that the process moved on from the core update, but then experienced a hang after everything was updated. I found that adding FROM mcr.microsoft.com/windows/servercore:ltsc2022
COPY ./msys2-x86_64-latest.sfx.exe /msys2.exe
RUN powershell -Command \
$ErrorActionPreference = 'Stop'; \
$ProgressPreference = 'SilentlyContinue'; \
/msys2.exe -y -oC:\; \
function msys() { C:\msys64\usr\bin\bash.exe @('-lc') + @Args; } \
msys ' '; \
msys 'pacman --noconfirm -Syuu'; \
msys 'pacman --noconfirm -Syuu'; \
msys 'pacman --noconfirm -Scc'; \
rm -r -fo 'C:\$Recycle.Bin\'; \
echo Done; I'm guessing that docker is choking on the odd unicode characters in the 'binned' files? msys2/MSYS2-packages#4622 Another thing I tried was adding For reference, I had a
I would wager those are the old versions of |
Wow, thanks @jeremyd2019 - I'll try your workaround. Does one perhaps conclude that docker build on windows tries to empty the recycle bin itself (perhaps for each layer) prior to completing and gets stuck? |
I was guessing that whatever docker uses to save the filesystem was tripping up on what are probably invalid unicode sequences. But I don't know anything about what docker does. |
Yeah that makes more sense given the nature of filesystems and what must be needed for layer exports on windows. |
If you're interested, the construction of the filenames is explained here: https://github.com/msys2/msys2-runtime/blob/abcb3c6c0f330ac7568956b2be6bf3376517bb56/winsup/cygwin/syscalls.cc#L342-L346 |
Your |
Ok, managed to find what seems the root problem here, with similar workarounds discovered: microsoft/Windows-Containers#213 Fix appears to require a containerd (and presumably Docker runtime?) built with Go 1.21+ containerd/containerd#8957 (comment) to resolve golang/go#59971 . I am currently using Windows Server 2022 images on GHA which at time of writing have Docker 24.0.7 on them. https://github.com/actions/runner-images/blob/releases/win22/20240514/images/windows/Windows2022-Readme.md This has containerd 1.7.6 in it; but in any case, both Go 1.20 built. I believe the fix is only in containerd 1.7.14 which landed via containerd/containerd#9860 and https://github.com/containerd/containerd/releases/tag/v1.7.14 and/or Docker 25 as the vendoring vs static containerd inclusion confuses me. |
FWIW I re-tested this today without the "recycle bin removal workaround" on newer runner images that contain Docker Engine @jeremyd2019 IMHO I think this can probably be closed now, in that the root cause was containerd-on-Windows problems, for which it has been fixed upstream. If the "junk left in recycle bin" is a problem on its own, I imagine that can/should be addressed separately within msys2 somewhere? |
That's good. I don't seem to have the ability to close this issue though |
Related (?) issue: #58
Thought I had the same issue but it turns out apparently not... Copying my message from there here below.
==========================
Same thing started happening to me yesterday with no changes on my end (been working for months up until now). Was definitely working two days ago since I have a script to force-rebuild my image daily. I see this on multiple machines and OS versions.
Tried both isolation modes (hyperv and process) and neither are currently working for me. One slight difference to the original reporter is that I'm using servercore:ltsc2022.
Still trying to diagnose further but unfortunately it's difficult to get diagnostic information from Windows containers for these sorts of issues. Trying to investigate using the information here: https://learn.microsoft.com/en-us/virtualization/windowscontainers/troubleshooting
EDIT:
Actually, I think I may have misunderstood the original issue in this thread and conflated it with mine. In my case the processes all appear to run successfully and terminate normally, but then the RUN step never finishes and just hangs indefinitely when finalizing the layer. I'm guessing this is probably a distinct issue and I misclassified it as the same as the one here due to the timing.
Strange though that it's happening on multiple machines and seems to coincide with the latest installer release. I tried rolling back to an earlier installer but that didn't help. Likely because whatever is causing the issue gets updated to the same version as in the latest installer anyway, but that's just a guess on my part so far - I need to do additional testing.
The text was updated successfully, but these errors were encountered: