-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate follower pod owned by same Job as leader pod #433
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
45c81ef
to
9ad5582
Compare
9ad5582
to
15c13a9
Compare
I am going to leave LGTM for @ahg-g on this one. I don't really have much context into this problem. |
165ba72
to
f48350f
Compare
f48350f
to
961710a
Compare
/lgtm |
This change validates the leader pod has same owner UID as the follower, to ensure they are part of the same Job.
This is necessary to handle a potential race condition between index updates and pod rescheduling during JobSet restarts.
[pod name without random suffix] -> corev1.Pod object, if this occurs before the index updates for the leader pod have been pushed to the controller, we may get a stale index entry and inject the the wrong nodeSelector, using the topology the leader pod was originally scheduled on before the
restart.