Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s hook runner fails occasionally #41

Closed
atahardjebbar-ledger opened this issue Nov 14, 2022 · 11 comments · Fixed by #65
Closed

k8s hook runner fails occasionally #41

atahardjebbar-ledger opened this issue Nov 14, 2022 · 11 comments · Fixed by #65
Assignees
Labels
bug Something isn't working k8s

Comments

@atahardjebbar-ledger
Copy link

atahardjebbar-ledger commented Nov 14, 2022

Hello,
We're having an issue on our self hosted runner, we're using k8s hook to execute our jobs, but some of jobs fails occasionally with this error, can't seems to understand what the issue is.

##[debug]/runner/externals/node16/bin/node /runner/k8s/index.js
node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^
[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "#<ErrorEvent>".] {
  code: 'ERR_UNHANDLED_REJECTION'
}
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.

Image: summerwind/actions-runner:v2.296.1-ubuntu-20.04
GHES version: 3.6.1
Any idea what the issue is ?
Thank you

@nikola-jokic
Copy link
Contributor

Hi @atahardjebbar-ledger,

Can you please provide an example repository to reproduce this? The only thing I can think of that can fail with the error message like this is this line:

const input = await getInputFromStdin()

We probably want to wrap this in try/catch as well but I am unsure. Is it possible that something is breaking the JSON causing JSON.parse to throw and promise to be rejected?

If you can provide a reproduction repo or just an example I can create, it will be really helpful!

@nikola-jokic nikola-jokic self-assigned this Nov 15, 2022
@atahardjebbar-ledger
Copy link
Author

atahardjebbar-ledger commented Dec 13, 2022

Hello @nikola-jokic
Thank you for your response and sorry the late replay. I can't provide that since we're running our GitHub on premise.
But i can provide other informations if you need.
The error happens on some Jobs in the same repository, it seems to be more probable in huge jobs, less probable (nearly never happened) in short jobs).
Thank you.

@nikola-jokic
Copy link
Contributor

Hey @atahardjebbar-ledger,

This information is really useful! If you find more information, it would be amazing. But I will start trying to reproduce this issue myself. I think I might have an idea where this might occur. I'll let you know if I get stuck and need more information. In the meantime, if you find anything else that might help us, please let us know ☺️

@nikola-jokic nikola-jokic added the bug Something isn't working label Dec 13, 2022
@atahardjebbar-ledger
Copy link
Author

atahardjebbar-ledger commented Jan 5, 2023

Hey @nikola-jokic
Happy New year
We're you able to reproduce the issue ?
Can i provide you with any more informations ?

@nikola-jokic
Copy link
Contributor

Hey @atahardjebbar-ledger, thanks! Happy new year to you as well!
No, I did not work on it for a while. I pushed this draft #49
It needs to be formatted but if you don't mind, can you build your hook with this so we can at least inspect the issue?
Not sure how to reproduce huge stdin read or is that the issue at all

@atahardjebbar-ledger
Copy link
Author

Hello @nikola-jokic
Thank you for the changes, we build it internally and tested we still have the same issue. We found a pattern, the issue happens only on heavy workflows. Do you have any other ideas we can explore ?

@nikola-jokic
Copy link
Contributor

I would love to dig deeper. Can you please provide an example workflow that I can use to reproduce it? Can you see the error? The fix was meant to pull the input read from the stdin into the try catch so the exception does not obfuscate the error for further inspection.
Now, I would need that error or anything else that can help me understand why is this occuring.

By heavy workflows, do you mean they run for the long time, or they send a lot of data to the hook in a call? Also, is it possible that the runner runs out of memory on that machine causing this exception?

Can you please provide diagnostics log of the runner? That might be helpful

@AEnguerrand
Copy link

Hi @nikola-jokic,
I can give more information about this issue (I'm working with @atahardjebbar-ledger), so we are still experiencing this issue for almost all workflow runs (with 0 to 5 jobs failing), and we build+run the version with the fix (#49).

Regarding the workflow file, I can't share it publicly, but It's composed of multiple jobs/actions to do the compilation. This workflow is composed of around 20 jobs, each job having a runtime from 5 min to 1h30 (with a total time for the workflow of approximately 4h).

So, we have more information about the error based on the debug log (sent to Github); in this case, it happened for the "Post-Clone" operation:

##[debug]Evaluating condition for step: 'Post Clone'
##[debug]Evaluating: always()
##[debug]Evaluating always:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Clone
##[debug]Loading inputs
##[debug]Evaluating: github.ref
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'ref'
##[debug]=> 'refs/heads/next'
##[debug]Result: 'refs/heads/next'
##[debug]Evaluating: github.repository
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'repository'
##[debug]=> '<REDACTED>'
##[debug]Result: '<REDACTED>'
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Running JavaScript Action with default external tool: node1[2](https://<REDACTED>/runs/62041?check_suite_focus=true#step:24:2)
Run '/runner/k8s/index.js'
  shell: /runner/externals/node16/bin/node {0}
##[debug]/runner/externals/node16/bin/node /runner/k8s/index.js
node:internal/process/promises:2[4](https://<REDACTED>/runs/62041?check_suite_focus=true#step:24:4)[6](https://<REDACTED>/runs/62041?check_suite_focus=true#step:24:6)
          triggerUncaughtException(err, true /* fromPromise */);
          ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "#<ErrorEvent>".] {
  code: 'ERR_UNHANDLED_REJECTION'
}
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/runner/k[8](https://<REDACTED>/runs/62041?check_suite_focus=true#step:24:8)s/index.js' running command 'RunScriptStep' did not execute successfully
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   --- End of inner exception stack trace ---
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.RunScriptStepAsync(IExecutionContext context, ContainerInfo container, String workingDirectory, String entryPoint, String entryPointArgs, IDictionary`2 environmentVariables, String prependPath)
##[debug]   at GitHub.Runner.Worker.Handlers.ContainerStepHost.ExecuteAsync(IExecutionContext context, String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, String standardInInput, CancellationToken cancellationToken)
##[debug]   at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
##[debug]   at GitHub.Runner.Worker.ActionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Post Clone

Thanks for your help; also, do you have a recommended way to extract logs from the _diag folder? (because we are using ephemeral runner, so each time a job finishes, logs are gone)

@nikola-jokic
Copy link
Contributor

Hey @AEnguerrand,

Thank you for providing this! I'll try my best to reproduce it, but this might require another change to see what is exactly happening. The only place that is outside the try/catch block is setting the exit code. That might be what is causing this issue. I'll push a fix, and maybe we will need another iteration (if you don't mind ☺️, I can't reproduce this behaviour, so I am simply guessing what might be an issue)

@nikola-jokic
Copy link
Contributor

Hey @AEnguerrand, @atahardjebbar-ledger,

Can you try the fix from the PR #65

@AEnguerrand
Copy link

Hi @nikola-jokic, Yes, I'm doing the setup for the Github Action runners to use this version; thanks. I will post the result after a few runs 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working k8s
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants