-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add force-grace-period-seconds
argument to step cancel
command
#3084
Conversation
ba910f9
to
8c9382a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM! Meets the spec as we defined 🧡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #2250 we started on a path towards parsing duration parameters as time.Duration
, which means people can write more convenient expressions like 1m
or 17h
, but in the meantime, we would be more specific with time units. Which is why Nepa put seconds
in the name of signal-grace-period-seconds
- for backwards compatibility cancel-grace-period
had to be left.
To be consistent 🙃 this new flag (and env var) also ought to end with -seconds
(_SECONDS
).
Other than one other comment, this looks pretty good to me!
a3b8339
to
9a50348
Compare
force-grace-period
argument to step cancel
commandforce-grace-period-seconds
argument to step cancel
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
Description
We recently introduced a
buildkite-agent step cancel
subcommand. The command can accept a flag--force
, which will immediately cancel unfinished jobs in Buildkite (i.e. mark jobs as 'canceled') without waiting for the agents running the jobs to finish. Any subsequent artifact uploads to Buildkite by these agent running this job will begin to fail.Now this
--force
flag might feel a little counter-intuitive when you think about how agents have acancel-grace-period
argument - you could not use the--force
flag, marking the jobs as 'canceling' and then rely on agents to SIGKILL their process once theircancel-grace-period
has elapsed.Imagine though that an agent isn't responding to Buildkite's request to cancel the job (and so we can't rely on
cancel-grace-period
within an agent). In this PR, we add an additional argument--force-grace-period
to the subcommand to handle this. The value offorce-grace-period
will determine how long within Buildkite it will take for the job to move from a 'canceling' to a 'canceled' state (more information about job states can be read here). Because jobs are then in a 'canceled' state, lost agents can be reaped much faster within Buildkite.It's worth noting:
--force-grace-period-seconds
without also supplying--force
will not do anything. We rely on the--force
flag to forecfully mark a job as 'canceled' within Buildkite.--force-grace-period-seconds
must be greater than or equal to zero. Negative values just don't make sense.--force-grace-period-seconds
is essentially overriding the value ofcancel-grace-period
for agents that are still connected to Buildkite. The default value of--force-grace-period-seconds
if not supplied is the value of the current agent running thebuildkite-agent step cancel
command: if the agent running thebuildkite-agent step cancel
command has acancel-grace-period
value less than the agents having their jobs 'canceled', these agents will cleanup sooner than you might anticipate.Context
buildkite-agent step cancel
subcommand #3070Testing
go test ./...
). Buildkite employees may check this if the pipeline has run automatically.go fmt ./...
)