-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-test-commit-arm is broken #3044
Comments
I have brought one of the arm machines back online and its seems to be processing jobs. |
FWIW it looks like the bottleneck was ubuntu2004-armv7l. In addition to the machine that was offline, the container on the other machine appears to have gone down on 15 Sept:
I've restarted the container, so we now have the two back online. |
I'm guessing this was another occurrence of #2894? |
Thank you @mhdawson and @richardlau! |
Hmmm that's interesting if it was just the container that failed. |
|
|
I've restarted the |
Hmmm - so this is ONLY affecting the armv7l container on the host? |
AFAICT at the moment just the ubuntu2004 armv7l containers. The debian10 armv7l containers were still working. |
It looks like the NullPointerException has happened again and I'm not sure if the agent on that container is now broken again (https://ci.nodejs.org/job/node-test-commit-arm/nodes=ubuntu2004-armv7l/44015/console looks stuck). I've restarted the container on the other Altra -- will see if the problem reoccurs while watching (virtually) the rest of this morning's NodeConf EU talks. Might trying updating the containers this afternoon to Java 11/17 (currently on 8). |
I'm seeing this immediately after resuming CI:
|
@tniessen |
@richardlau I am not sure I understand what I did wrong; this is happening in two PRs and I am pretty sure I only resumed/restarted failing builds. I guess I'll start new builds and see if that works. |
@tniessen Arm builds were backlogged heavily over the weekend due to the machines being offline and then catching up the already queued jobs. nodejs/node#44849 appears to have at least two builds which ended up racing each other:
The arm build from https://ci.nodejs.org/job/node-test-pull-request/46975/ succeeded and caused the branch in the binary repo to be deleted while https://ci.nodejs.org/job/node-test-pull-request/46992/ was still in progress and causing the arm build there to fail. You've been resuming builds from https://ci.nodejs.org/job/node-test-pull-request/46992/ so the resumed builds think they need to resume the failed arm build but the branch no longer exists. I could easily see this happening in other PRs during the backlog if builds had started for the same commit while earlier builds for that commit were still waiting for the offline arm machines to come back online. |
Ah, thanks for the clarification @richardlau! That makes a lot of sense :) |
Haven't seen the problem again. FWIW I've updated the containers to run on Java 17 now. |
It seems that node-test-commit-arm jobs never completes. This affects all PR CIs:
Unfortunately, the "Auto start CI" GitHub action is unaware of the issue and just silently removes the
request-ci
label while queuing even more jobs.The text was updated successfully, but these errors were encountered: