ARM runner has been stuck for multiple days #80

Biswa96 · 2023-10-16T04:11:27Z

This CI job is running for days https://github.com/msys2-arm/msys2-autobuild/actions/runs/6508662089

@jeremyd2019

jeremyd2019 · 2023-10-16T17:22:51Z

It managed to hang up right as I was leaving on a long weekend trip, and I didn't notice until I got back. I wanted to get a fresh runner going anyway, for the latest Windows Updates, but was waiting until I got back to try to avoid any issues while I was gone 😁. New runner is going now

Biswa96 · 2023-10-19T12:43:53Z

@jeremyd2019 Would you like to check if this CI job is stuck again https://github.com/msys2-arm/msys2-autobuild/actions/runs/6570301888 ?

lazka · 2023-11-01T15:11:53Z

@jeremyd2019 https://github.com/msys2-arm/msys2-autobuild/actions/runs/6714121489/job/18246869719

lazka · 2023-11-24T14:30:08Z

@jeremyd2019 likely stuck https://github.com/msys2-arm/msys2-autobuild/actions/runs/6980352526/job/18995436393

jeremyd2019 · 2023-11-26T17:27:35Z

I've been ruminating on the idea of some sort of 'watchdog' to detect and kill stuck pacman processes automatically, but I haven't settled on the best language/technology to do so. It seems like python would be most convenient since autobuild is already python, I could put a background thread like I did to try polling the token, but I'm not familiar with process querying/killing modules.

What I've got so far is a cygwin commands to get the cygwin pid of the process I want to kill (what I really want is the child pacman process, this gets the newest pacman process older than 1800 seconds)

pgrep -xn -O 1800 pacman

coupled with the script I already had (because when stuck in this state cygwin kill is not sufficient)
https://github.com/jeremyd2019/winautoconfig/blob/master/msys2-runner-setup/setupscripts/wkill.sh

Biswa96 · 2023-11-26T17:31:30Z

It would be a bit clear if the reason of such CI failure is explained.

jeremyd2019 · 2023-11-28T16:21:45Z

lost power, so any lack of runner in the near future will be due to that

power is back

lazka · 2023-12-29T14:47:31Z

@jeremyd2019 https://github.com/msys2-arm/msys2-autobuild/actions/runs/7355161710

lazka · 2024-02-11T19:09:35Z

@jeremyd2019 https://github.com/msys2-arm/msys2-autobuild/actions/runs/7862352714

jeremyd2019 · 2024-03-23T16:25:07Z

unstuck it. the powershell variant in git-for-windows/git-for-windows-automation#61 (comment) was intriguing, it seems like it could be close to being turned into a 'watchdog', would just need to also query CreationDate field to see any pacman processes that have been running a long time (like a half hour? or hour?), and then arrange for it to run continuously (scheduled task?). Of course, I'd much rather get whatever bug is causing this fixed...

lazka · 2024-05-03T18:41:19Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/8938789079/job/24553638109

jeremyd2019 · 2024-05-19T01:20:40Z

There's a stuck job now, but it doesn't seem to be the runner this time. Probably something on Github's end.

Biswa96 · 2024-07-01T14:01:49Z

stuck again https://github.com/msys2-arm/msys2-autobuild/actions/runs/9739288128

jeremyd2019 · 2024-07-01T19:15:34Z

This seems to be a different issue. I think maybe the machine rebooted. I did a quick check and didn't notice any excess packages installed.

Biswa96 · 2024-07-26T15:24:02Z

stuck again https://github.com/msys2-arm/msys2-autobuild/actions/runs/10108768753

lazka · 2024-08-06T16:52:19Z

"echo: write error: No space left on device"

jeremyd2019 · 2024-08-06T17:36:46Z

What?!? I deleted some of the cruft under %USERPROFILE% (go, .cargo mainly) and increased some free space. Will try to build rust again

lazka · 2024-08-06T18:40:41Z

Is #76 related?

Otherwise, try good old WinDirStat :)

Biswa96 · 2024-08-25T14:52:05Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/10539907589

lazka · 2024-09-08T12:23:26Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/10757142237

lazka · 2024-10-19T19:10:49Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/11418583395/job/31772208187

lazka · 2024-10-19T21:22:40Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/11418583395/job/31772208187

seems to have gotten unstuck and errored out after some hours. (or anyone poked at it?)

Unrelated note: Runner groups are now available for everyone it seems. Not that it makes much difference with the current setup with a separate org, but good to know: https://github.blog/changelog/2024-10-17-actions-runner-groups-now-available-for-organizations-on-free-plan/

jeremyd2019 · 2024-10-20T05:30:26Z

seems to have gotten unstuck and errored out after some hours. (or anyone poked at it?)\

I killed the child pacman process, as usual.

Unrelated note: Runner groups are now available for everyone it seems. Not that it makes much difference with the current setup with a separate org, but good to know: https://github.blog/changelog/2024-10-17-actions-runner-groups-now-available-for-organizations-on-free-plan/

Yeah, I could do away with the extra labels to differentiate between autobuild and CI instances and enforce it with runner groups instead, presumably.

Biswa96 · 2024-10-31T05:30:23Z

https://github.com/msys2-arm/msys2-autobuild/actions/runs/11604120891

mmuetzel · 2024-11-01T12:35:21Z

It looks like it stuck again at checking package integrity...:
https://github.com/msys2-arm/msys2-autobuild/actions/runs/11625263303/job/32375075188

mmuetzel · 2024-11-07T19:02:27Z

Is it stuck again? This time with checking keyring... as the last line in the log:
https://github.com/msys2-arm/msys2-autobuild/actions/runs/11729327900/job/32674802852#step:11:507

jeremyd2019 · 2024-11-07T19:05:25Z

yep, killed

lazka · 2024-11-09T19:19:14Z

gpg is weirdly bugged right now it seems

jeremyd2019 · 2024-11-09T23:17:26Z

I killed dirmngr and keyboxd processes, and deleted ~/.gnupg, hopefully will make it happier. I'm clearing all the failed builds and am starting a build.

I also ran pacman -Scc since there has been a lot of package thrashing lately, maybe the disk was getting full?

Hmm, wasn't there a gnupg update recently? Maybe the dirmngr/keyboxd processes were still from the old version (since the same install is reused for each run), and that was causing issues?

jeremyd2019 · 2024-11-10T17:04:42Z

Looks like it happened again. Killed pacman and dirmngr/keyboxd again.

lazka · 2024-11-10T17:26:06Z

Hmm, wasn't there a gnupg update recently?

yeah, it aligns with the update.. sadly

mmuetzel · 2024-11-14T08:17:28Z

It looks like the runner is stuck again. This time checking package integrity... is the last line in the log.

jeremyd2019 · 2024-11-14T16:33:18Z

During setup-msys2... that's different...

Biswa96 changed the title ~~ARM runner has been stuck for mutiple days~~ ARM runner has been stuck for multiple days Oct 16, 2023

lazka pinned this issue Dec 29, 2023

jeremyd2019 mentioned this issue Nov 8, 2024

use runner groups for self-hosted runners #93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM runner has been stuck for multiple days #80

ARM runner has been stuck for multiple days #80

Biswa96 commented Oct 16, 2023

jeremyd2019 commented Oct 16, 2023

Biswa96 commented Oct 19, 2023

lazka commented Nov 1, 2023

lazka commented Nov 24, 2023

jeremyd2019 commented Nov 26, 2023

Biswa96 commented Nov 26, 2023

jeremyd2019 commented Nov 28, 2023 •

edited

Loading

lazka commented Dec 29, 2023

lazka commented Feb 11, 2024

jeremyd2019 commented Mar 23, 2024

lazka commented May 3, 2024

jeremyd2019 commented May 19, 2024

Biswa96 commented Jul 1, 2024

jeremyd2019 commented Jul 1, 2024 •

edited

Loading

Biswa96 commented Jul 26, 2024

lazka commented Aug 6, 2024

jeremyd2019 commented Aug 6, 2024

lazka commented Aug 6, 2024

Biswa96 commented Aug 25, 2024

lazka commented Sep 8, 2024

lazka commented Oct 19, 2024

lazka commented Oct 19, 2024 •

edited

Loading

jeremyd2019 commented Oct 20, 2024 •

edited

Loading

Biswa96 commented Oct 31, 2024

mmuetzel commented Nov 1, 2024

mmuetzel commented Nov 7, 2024

jeremyd2019 commented Nov 7, 2024

lazka commented Nov 9, 2024

jeremyd2019 commented Nov 9, 2024 •

edited

Loading

jeremyd2019 commented Nov 10, 2024

lazka commented Nov 10, 2024

mmuetzel commented Nov 14, 2024

jeremyd2019 commented Nov 14, 2024

ARM runner has been stuck for multiple days #80

ARM runner has been stuck for multiple days #80

Comments

Biswa96 commented Oct 16, 2023

jeremyd2019 commented Oct 16, 2023

Biswa96 commented Oct 19, 2023

lazka commented Nov 1, 2023

lazka commented Nov 24, 2023

jeremyd2019 commented Nov 26, 2023

Biswa96 commented Nov 26, 2023

jeremyd2019 commented Nov 28, 2023 • edited Loading

lazka commented Dec 29, 2023

lazka commented Feb 11, 2024

jeremyd2019 commented Mar 23, 2024

lazka commented May 3, 2024

jeremyd2019 commented May 19, 2024

Biswa96 commented Jul 1, 2024

jeremyd2019 commented Jul 1, 2024 • edited Loading

Biswa96 commented Jul 26, 2024

lazka commented Aug 6, 2024

jeremyd2019 commented Aug 6, 2024

lazka commented Aug 6, 2024

Biswa96 commented Aug 25, 2024

lazka commented Sep 8, 2024

lazka commented Oct 19, 2024

lazka commented Oct 19, 2024 • edited Loading

jeremyd2019 commented Oct 20, 2024 • edited Loading

Biswa96 commented Oct 31, 2024

mmuetzel commented Nov 1, 2024

mmuetzel commented Nov 7, 2024

jeremyd2019 commented Nov 7, 2024

lazka commented Nov 9, 2024

jeremyd2019 commented Nov 9, 2024 • edited Loading

jeremyd2019 commented Nov 10, 2024

lazka commented Nov 10, 2024

mmuetzel commented Nov 14, 2024

jeremyd2019 commented Nov 14, 2024

jeremyd2019 commented Nov 28, 2023 •

edited

Loading

jeremyd2019 commented Jul 1, 2024 •

edited

Loading

lazka commented Oct 19, 2024 •

edited

Loading

jeremyd2019 commented Oct 20, 2024 •

edited

Loading

jeremyd2019 commented Nov 9, 2024 •

edited

Loading