-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide -no-shutdown-delay
flag for job/alloc stop
#11596
Conversation
45c0964
to
a93869e
Compare
a93869e
to
1647bde
Compare
1647bde
to
6d23598
Compare
6d23598
to
fbe74fb
Compare
fbe74fb
to
5ae23a9
Compare
5ae23a9
to
85b2263
Compare
|
9e7cede
to
bea3bb4
Compare
b99b090
to
0996798
Compare
0996798
to
2173424
Compare
2173424
to
2cd4362
Compare
Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.
2cd4362
to
179b60b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! No blockers. Couple of suggestions for a grammar fix in the doc text that is included in several places.
@@ -1600,7 +1600,7 @@ func (s *StateStore) upsertJobImpl(index uint64, job *structs.Job, keepVersion b | |||
} | |||
|
|||
if err := s.updateJobCSIPlugins(index, job, existingJob, txn); err != nil { | |||
return fmt.Errorf("unable to update job scaling policies: %v", err) | |||
return fmt.Errorf("unable to update job csi plugins: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional for this change set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just a fixing a nearby typo while I was here.
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #11448
Some operators use very long group/task
shutdown_delay
settings tosafely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.
Provide a
-no-shutdown-delay
flag on thenomad alloc stop
andnomad job stop
commands that bypasses the delay. This sets a newdesired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.
Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before their clients receive updated service discovery
information and won't be gracefully drained.