-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let artifact phase and post-command run in grace period #2899
Conversation
945dbb7
to
0f8c8bc
Compare
@DrJosh9000 So this change resolves issues with this flag breaking uploads 🎉 |
0f8c8bc
to
6abde8e
Compare
6abde8e
to
752cc0a
Compare
752cc0a
to
47aa156
Compare
47aa156
to
2d0fcc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work chasing down all these loose ends 🎉
Appreciate the extra internal documentation as well. ⭐
Description
Artifact phase (the built-in artifact upload performed by the agent) and post-command hooks are now allowed to run in the signal grace period (a configurable length of time after job cancellation).
The signal grace period is also now enforced within the job executor (as well as outside of it when run normally as a subprocess of the agent).
Context
Artifacts uploaded this way can be relevant to user debugging job timeouts, so we should try a bit harder to upload them.
A similar argument applies to post-command hooks, which can be used for post-job cleanup.
Both of these (and usual job teardown) are still subject to the executor being SIGKILL-ed by the agent parent process. However, some use-cases run
buildkite-agent bootstrap
separately to the agent ("standalonebootstrap
", which happens for example in agent-stack-k8s). So the executor needs to be able to kill itself in case there's nothing else to do so, hencegraceCtx
instead ofnonCancelCtx
.The executor (a.k.a. bootstrap) is also normally passed the (computed) signal grace period as a flag/env var from the parent agent process, but the flag shares the same definition as
agent start
(with default -1). If there is no parent agent process, the -1 signal grace period will causegraceCtx
to be cancelled immediately after the main context is cancelled. An existing integration test shows this. So, for standalonebootstrap
, the same flag computation that happens inagent start
to obtain a positivesignalGracePeriod
should be done inbootstrap
as well.Changes
withGracePeriod
helper method, to create a new context that is cancelled some period of time after another context is cancelled.withGracePeriod
to replace the existing detachednonCancelCtx
with a newgraceCtx
. The difference is thatgraceCtx
will still be cancelled after the signal grace period.graceCtx
fortearDown
,artifactPhase
, and create a grace context forrunPostCommandHooks
.agent start
into a function shared withbootstrap
.Testing
go test ./...
). Buildkite employees may check this if the pipeline has run automatically.go fmt ./...
)