-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow updating scheduling directives of suspended jobs that never started #105479
Conversation
@ahg-g: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/label api-review |
/assign @liggitt |
/sig scheduling |
oldTemplate.Spec.NodeSelector = template.Spec.NodeSelector | ||
oldTemplate.Spec.Tolerations = template.Spec.Tolerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k I thought we had a verification check that would complain about these if there wasn't a +k8s:verify-mutation
comment (in this case, it's actually ok because we're working on a copy made above, but I still expected the explicit check / comment to be required)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests are supposed to fail if this line isn't added? It seems that the tests are still passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify CI is failing the verify: non-mutating-validation
test (you can run yourself with hack/verify-non-mutating-validation.sh
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
oldTemplate := &oldSpec.Template | ||
if opts.AllowMutableSchedulingDirectives { | ||
oldTemplate = oldSpec.Template.DeepCopy() | ||
if template.Spec.Affinity != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it misses the case where template.Spec.Affinity is nil and oldTemplate.Spec.Affinity has node affinity directives. if so, add a unit test that would have caught that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! one thing here to note here, to succeed, the current validation will require having an empty Affinity if NodeAffinity was removed and other affinity types don't exist, removing the whole Affinity spec will fail.
Original
Template {
Affinity: {
NodeAffinity: {
....
}
}
Update succeeds
Template: {
Affinity: {}
}
Update fails
Template: {
}
supporting the case above will probably require cloning the new template or do some magic to check if Affinity is empty without actually individually checking its members (because the logic will break if a new member is added and we forget to update it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt is the above semantics acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, something like this, perhaps:
switch {
case template.Spec.Affinity == nil && oldTemplate.Spec.Affinity != nil:
// allow the Affinity field to be cleared if the old template had no affinity directives other than NodeAffinity
oldTemplate.Spec.Affinity.NodeAffinity = nil
if (*oldTemplate.Spec.Affinity) == (api.Affinity{}) {
oldTemplate.Spec.Affinity = nil
}
case template.Spec.Affinity != nil && oldTemplate.Spec.Affinity == nil:
// allow the NodeAffinity field to skip immutability checking
oldTemplate.Spec.Affinity.NodeAffinity = &api.Affinity{NodeAffinity: template.Spec.Affinity.NodeAffinity}
case template.Spec.Affinity != nil && oldTemplate.Spec.Affinity != nil:
// allow the NodeAffinity field to skip immutability checking
oldTemplate.Spec.Affinity.NodeAffinity = template.Spec.Affinity.NodeAffinity
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, updated.
/approve API bits lgtm, go ahead and squash |
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Field: "spec.template", | ||
}, | ||
}, | ||
"mutable node affinity": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are covering the first and second branch. We could have these tests:
add node affinity
change node affinity
remove node affinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another test to cover the third branch.
…jobs that never started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -629,6 +630,81 @@ func TestSuspendJobControllerRestart(t *testing.T) { | |||
}, false) | |||
} | |||
|
|||
func TestNodeSelectorUpdate(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing this flake a lot at HEAD - https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=TestNodeSelectorUpdate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for flagging, we need to wrap the update calls with RetryOnConflict
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implements kubernetes/enhancements#2926
Which issue(s) this PR fixes:
Fixes #104714
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: