Flaky "Build from tarball" workflow on GitHub Actions #33947

mmarchini · 2020-06-18T17:23:05Z

This has been going on for at least a month, and seems to be the major source of flakiness on our Actions runs. Sometimes the build-tarball job will fail with no logs after 40+ minutes:

(https://github.com/nodejs/node/runs/784411094?check_suite_focus=true)

Sometimes it will fail with an error like below:

rm 664a568854abba2b997909662a528594c7526386.intermediate e79088d28d3209bde653372a7712225ba4f65b2e.intermediate 495122784be95331de6389fc8588b7b3a0594912.intermediate ca02573a617ca00970b9a2e674797e70460dac89.intermediate
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
##[error]Process completed with exit code 2.

(https://github.com/nodejs/node/runs/784761789?check_suite_focus=true)

As a consequence, the other jobs in the tarball workflow won't run.

The text was updated successfully, but these errors were encountered:

richardlau · 2020-06-18T17:45:15Z

Sometimes it will fail with an error like below:

rm 664a568854abba2b997909662a528594c7526386.intermediate e79088d28d3209bde653372a7712225ba4f65b2e.intermediate 495122784be95331de6389fc8588b7b3a0594912.intermediate ca02573a617ca00970b9a2e674797e70460dac89.intermediate
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
##[error]Process completed with exit code 2.

(https://github.com/nodejs/node/runs/784761789?check_suite_focus=true)

It’s lost among all the output (the UI for browsing logs isn’t great) but this run was an actual failure:

/Users/runner/runners/2.263.0/work/node/node/tools/doc/allhtml.js:87
2020-06-18T15:08:02.2260160Z   if (!ids.has(match[1])) throw new Error(`link not found: ${match[1]}`);
2020-06-18T15:08:02.2339310Z                           ^
2020-06-18T15:08:02.2339780Z 
2020-06-18T15:08:02.2340130Z Error: link not found: tls_server_addcontext_hostname_context
2020-06-18T15:08:02.2340480Z     at Object.<anonymous> (/Users/runner/runners/2.263.0/work/node/node/tools/doc/allhtml.js:87:33)
2020-06-18T15:08:02.2340840Z     at Module._compile (internal/modules/cjs/loader.js:1138:30)
2020-06-18T15:08:02.2341190Z     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
2020-06-18T15:08:02.2341570Z     at Module.load (internal/modules/cjs/loader.js:986:32)
2020-06-18T15:08:02.2352730Z     at Function.Module._load (internal/modules/cjs/loader.js:879:14)
2020-06-18T15:08:02.2379940Z     at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
2020-06-18T15:08:02.2417890Z     at internal/main/run_main_module.js:17:47
2020-06-18T15:08:02.2419920Z make[1]: *** [out/doc/api/all.html] Error 1
2020-06-18T15:08:02.2436820Z make[1]: *** Waiting for unfinished jobs....
2020-06-18T15:08:02.2437900Z make: *** [doc-only] Error 2
2020-06-18T15:08:02.2444480Z make: *** Waiting for unfinished jobs....

mmarchini · 2020-06-18T18:10:34Z

Good, only one flaky failure left :)

richardlau · 2020-06-18T18:15:26Z

If there are no logs (the raw log doesn't even contain the output from the steps that completed successfully) it sounds more like a failure on GitHub's side.

mmarchini · 2020-06-18T19:01:06Z

It does, but it only happens on that specific job and in happens quite frequently. There might be something we can do to fix or mitigate it.

codebytere · 2020-06-20T23:40:33Z

Have we tried running with ACTION_STEP_DEBUG set to true? the logs will be a bit hairier but it might give us a hint

richardlau · 2020-06-20T23:49:52Z

Have we tried running with ACTION_STEP_DEBUG set to true? the logs will be a bit hairier but it might give us a hint

No we haven't. I believe it will require a repository admin (i.e. TSC member) to add the secret to this repository: https://help.github.com/en/actions/configuring-and-managing-workflows/managing-a-workflow-run#enabling-step-debug-logging

mmarchini · 2020-06-21T00:21:59Z

Doesn't seem like we're able to set it for a specific workflow though. Would there be any unwanted side effects with it (performance impact, leaked secret variables, etc.)?

codebytere · 2020-06-21T00:34:34Z

The DX for actions debug is not great still - secrets set properly will all show up starred out (********) but given that it's not scopable per-action i'd say that our best bet should we do it is to turn it on, trigger a run or two to see the issue, pull the raw logs to dig through, and then turn it back off 🤔

MylesBorins · 2020-06-21T07:24:18Z

TBH this test is fairly wasteful. Should we maybe move it to be cron based?

mmarchini · 2020-08-05T02:59:26Z

I think this was fixed? If not feel free to reopen

mmarchini added build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jun 18, 2020

mmarchini mentioned this issue Jun 29, 2020

build: remove GitHub Actions for tarballs? #34123

Closed

mmarchini closed this as completed Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky "Build from tarball" workflow on GitHub Actions #33947

Flaky "Build from tarball" workflow on GitHub Actions #33947

mmarchini commented Jun 18, 2020

richardlau commented Jun 18, 2020

mmarchini commented Jun 18, 2020

richardlau commented Jun 18, 2020

mmarchini commented Jun 18, 2020

codebytere commented Jun 20, 2020

richardlau commented Jun 20, 2020

mmarchini commented Jun 21, 2020

codebytere commented Jun 21, 2020

MylesBorins commented Jun 21, 2020

mmarchini commented Aug 5, 2020

Flaky "Build from tarball" workflow on GitHub Actions #33947

Flaky "Build from tarball" workflow on GitHub Actions #33947

Comments

mmarchini commented Jun 18, 2020

richardlau commented Jun 18, 2020

mmarchini commented Jun 18, 2020

richardlau commented Jun 18, 2020

mmarchini commented Jun 18, 2020

codebytere commented Jun 20, 2020

richardlau commented Jun 20, 2020

mmarchini commented Jun 21, 2020

codebytere commented Jun 21, 2020

MylesBorins commented Jun 21, 2020

mmarchini commented Aug 5, 2020