Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-rackspace-win2022_vs2022-x64-6 systematic build timeout for win-vs2022-arm64 #3504

Closed
targos opened this issue Sep 27, 2023 · 17 comments · Fixed by #3514 or #3842
Closed

test-rackspace-win2022_vs2022-x64-6 systematic build timeout for win-vs2022-arm64 #3504

targos opened this issue Sep 27, 2023 · 17 comments · Fixed by #3514 or #3842

Comments

@targos
Copy link
Member

targos commented Sep 27, 2023

See https://ci.nodejs.org/computer/test-rackspace-win2022_vs2022-x64-6/builds

All win-vs2022-arm64 systematically fail with:

00:56:43 Cross-compilation to ARM64 detected. We'll use the x64 Node executable for license2rtf.
01:56:43 Build timed out (after 60 minutes). Marking the build as failed.
01:56:44 Build was aborted

Examples:
https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2022-arm64/52608/console
https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2022-arm64/52601/console
https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2022-arm64/52597/console

Other machines don't have this problem so I removed the win-vs2022-arm64 label from that one in https://ci.nodejs.org/computer/test-rackspace-win2022_vs2022-x64-6/configure to prevent CI failures.

@joyeecheung
Copy link
Member

joyeecheung commented Sep 27, 2023

nodejs/reliability#676 indicates that this happens on test-rackspace-win2022_vs2022-x64-1 and test-rackspace-win2022_vs2022-x64-2 too, maybe less frequent though

@richardlau
Copy link
Member

00:56:43 Cross-compilation to ARM64 detected. We'll use the x64 Node executable for license2rtf.

This message is immediately before an attempt is made to download a x64 Node.js Windows binary (to convert the license to RTF). #3477 (comment) says that on the test CI that there should be a caching mechanism to prevent the builds getting stuck here.

@targos
Copy link
Member Author

targos commented Sep 27, 2023

@StefanStojanovic
Copy link
Contributor

I took a look. There is indeed a cache there but it can get corrupted (that's what happened here). I'll improve the logic to prohibit this. There is also an occasional ongoing problem with release CI where nodes get stuck trying to download the x64 node.exe for days (no caching there). I'll try to make a standardized approach that will work in both CI environments.

@targos
Copy link
Member Author

targos commented Oct 19, 2023

@targos
Copy link
Member Author

targos commented Oct 19, 2023

@targos targos reopened this Oct 19, 2023
@StefanStojanovic
Copy link
Contributor

Thanks for letting me know @targos! This is a different bug, but it represents the same. Caching only works well if it can download the x64 binary, in runs after you reenabled the machine I've seen this when downloading node.exe:

curl: (18) transfer closed with 69583820 bytes remaining to read

The machine didn't have a cached node.exe and in one run compilation failed. The new caching mechanism improves on the old one in a way that it doesn't overwrite a valid binary with an invalid one. Since this machine didn't have a valid binary to begin with, failure was a possibility. Now I've set it up correctly and checked other machines for invalid cache. Since now everything is as expected, the new caching mechanism should make sure it stays like that by never invalidating node.exe.

I'll be monitoring this in the following weeks and hopefully, this will not happen again.

@targos
Copy link
Member Author

targos commented Oct 23, 2023

@targos
Copy link
Member Author

targos commented Mar 8, 2024

@targos
Copy link
Member Author

targos commented May 14, 2024

@targos
Copy link
Member Author

targos commented Jun 18, 2024

@richardlau
Copy link
Member

@StefanStojanovic
Copy link
Contributor

I recall that release CI doesn't use node.exe caching for signing ARM64 binaries. I even opened an issue to implement it some time ago #3527. This was based on an improvement I had for caching (https://github.com/nodejs/build/pull/3514/files), but it broke test CI because of SETLOCAL EnableDelayedExpansion so I had to revert it and I haven't touched it since then.

In short, in release CI we need the download from https://nodejs.org/dist/latest/win-x64/node.exe to go smoothly and that is not always the case as sometimes it hangs. Luckily that doesn't happen too often.

@StefanStojanovic
Copy link
Contributor

After landing the PR I ran windows fanned job, and node compile debug job, that was previously failing (with first PR that needed to be reverted) now passed. Will monitor CI runs closely in the next few days to make sure everything is OK.

@targos
Copy link
Member Author

targos commented Aug 1, 2024

@StefanStojanovic
Copy link
Contributor

Release CI doesn't use the compile.cmd, so the change I landed doesn't affect machines there. There is another issue I opened for this: #3527

@StefanStojanovic
Copy link
Contributor

However, looking at everything now, it seems very likely that this somehow causes the error described here, which is very weird since the compile.cmd script is not used in test jobs at all. Additionally, my changes do not change the compilation process in any way as they are done as a preparation step, so I do not expect Node.js binaries produced to be any different, but the ones compiled for ARM64 most likely are. The last 2 runs of my test job are the first occurrences of failures that have been bringing havoc in the Windows CI today. At that time this seemed as a coincidence, but from what I've seen today, it is not. I will revert the changes I landed here. After that, I expect everything to go back to normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants