Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container creation sometimes fails in the 32-bit ARM test job #1780

Open
EliahKagan opened this issue Jan 18, 2025 · 1 comment
Open

Container creation sometimes fails in the 32-bit ARM test job #1780

EliahKagan opened this issue Jan 18, 2025 · 1 comment

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Jan 18, 2025

Current behavior 😯

One of the changes in #1777 was to add an arm32v7 test job that runs in a container on the new arm64 runner (cbe3793, fbc27b5), analogous to the preexisting i386 test job that runs in a container on an amd64 runner. It looks like this may be brittle, with container creation failing from time to time. This is the failure noted in #1778 (comment).

/usr/bin/docker start 4224fb6a96d4ae28ceca367700326843715626ffe3eb995cdbe03b0aa4e0b4b2
  Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: unable to start unit "docker-4224fb6a96d4ae28ceca3677003[26](https://github.com/GitoxideLabs/gitoxide/actions/runs/12845418981/job/35819462653#step:2:29)843715626ffe3eb995cdbe03b0aa4e0b4b2.scope" (properties [{Name:Description Value:"libcontainer container 4224fb6a96d4ae28ceca367700326843715626ffe3eb995cdbe03b0aa4e0b4b2"} {Name:Slice Value:"system.slice"} {Name:Delegate Value:true} {Name:PIDs Value:@au [4198]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms): unknown
  Error: failed to start containers: 4224fb6a96d4ae28ceca367700326843715626ffe3eb995cdbe03b0aa4e0b4b2

This resembles,, and is probably identical to, some failures I had seen when testing #1777, that I had erroneously assumed (or hoped) were due to a hiccup in infrastructure rather than a persistent problem. This issue tracks that in case it is a persistent problem, which seems likely. If it happens again and no fix is apparent, I can revert the parts of #1777 that are about 32-bit testing, while keeping 87387c2 from it, which does not seem to have had any problems.

Something that probably isn't the cause

It is possible for a 64-bit ARM processor not to be capable of natively executing 32-bit ARM instructions--unlike 64-bit and 32-bit x86, this capability is not universal. When that happens, if binfmt_misc is configured to provide emulation via QEMU, a container of the incompatible architecture can still be run, but it will run much slower and some things may not work. However, while that was an early concern I had, as far as I can tell from the error that does not seem to be a factor here. Furthermore, in another repository, I checked in a reverse shell that no such architecture was enabled in binfmt_misc (EliahKagan/arm@496d9c1), and also even tried turning off binfmt_misc (EliahKagan/arm@efa15ff), and a 32-bit ARM binary was still able to run.

Expected behavior 🤔

The container specified in the container: key should start up at least as reliably in jobs on the ARM runner as other runners.

Git behavior

Not directly applicable, but Git does test on various platforms. Cursory inspection of the runs-on keys in this workflow suggests Git may not be using the new ubuntu-24.04-arm or ubuntu-22.04-arm GHA runners at this time.

Steps to reproduce 🕹

I'm unsure what factors trigger this, or if it is effectively random. It seems likely that it will happen again, but I'm not certain, so I'm opening this issue rather than immediately changing the workflow.

When I was working on the PR, I think it happened most often when I had two pushes separated by a very short time. My first thought was that it might have to do with caching. That is implausible, though, at least with respect to the caching of Rust dependencies that we are doing, because the failure happens much earlier, when the GitHub Actions runner software runs Docker to set up the job, before any steps of the job have begun.

@Byron
Copy link
Member

Byron commented Jan 19, 2025

Thanks for keeping track of this! My hope is that over time this issue will go away - the runners are still new and maybe the tracking software they have still has some shortcomings, growing pains if you will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants