-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record injected node affinity in batch Job #518
Comments
cc @ahg-g as the first author. |
My concern is that this will add another update request for every job. Is this cost justified? |
It can be the same API call that updates the Job spec. However, maybe we should store the original node selector instead of the injected one. |
Note that when the job is suspended, the controller will reset the nodeSelector on the job:
|
Yes, but sometimes the corner case could be job is unsuspended, but the workload is deleted in an unknown condition, like delete in manual, then the job will maintain a wrongly configured nodeSelector. |
/assign |
Right, it is ok to record the original nodeSelector as long as it is done in the same nodeSelector update request, but I think it is worth having a discussion on whether we want to attach the workload's life with the job using finalizers. |
Deleting a Workload seems like an important tool for forcing a requeue and adding finalizers could further complicate this use case. |
Add a finalizer to workload, when to delete the workload, restore the node selector with Job. Seems more convincible. I prefer to not use annotation if we can. But yes, we will have to handle the terminating workload in job reconciling. |
it's |
@mimowo could you take on a review for a future PR on this, in the context of the job integration framework? |
/assign |
Sure. |
I'm now wondering whether we should revert this, as we have a growing number of annotations to support partial admission. On support of this feature, we have resiliency: we can loose the Workload object and we can still recover the job. However, is it worth?
I think we can just document that users (including admins) shouldn't remove a Workload object. |
/reopen |
@alculquicondor: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Or leveraging finalizers here, when deleting jobs, we'll restore the node affinity and then remove the finalizer. |
The finalizers alternative require careful thinking too. We don't want to accidentally leave objects with finalizers. The problem here is that the Job object is a parent of the Workload object. As such, the Job can't be deleted unless the Workload is deleted first. Then we have a circular dependency. Another alternative is that a Workload has a finalizer if it's admitted. But the complication is that finalizers are not part of the status, so we need additional API calls. Not sure if it's worth the effort, but worth exploring. |
RE: It's not delete the job but delete the workload. The problem we can't restore the node affinity is because workload might be deleted accidentally, now we add the finalized to the workload, when we want to delete the workload, we'll restore the Job, then remove the finalizer, the workload will be deleted finally. |
Yea... Our annotations are so big...
That probably works fine. But we need to evaluate whether it is worth doing even if making more API calls as @alculquicondor says. |
An alternative could be to have an additional custom resource just to back-up selectors and counts. This resource should be owned by the job, so after we are creating / update it before unsuspending the job, we do not need to keep track of it's lifecycle. We will have additional API calls, but they should not trigger any controller. |
Another CRD would have the same issue about needing a finalizer. The workload is already an object that end-users shouldn't have permissions to edit or delete. I prefer we get rid of the annotation, without any finalizer in the Workload. And re-evaluate in the future if we find a use case where end-users need to modify the Workload object. |
Not exactly, there is no delete conditioning , when the job gets deleted so it's the "backup" resource. |
The Job also owns the Workload. So when we delete the Job, the Workload gets deleted as well. The problem is what happens if someone (not Kueue) deletes the Workload prematurely. |
Yes the problem is when the workload gets deleted before the restore, in that case the backup resource will still exists, and the restore can be done from that. When the job gets deleted ... we don't actually care what is happening to the selectors and counts. |
But what's the difference between the "backup" resource and Workload? They are both subject to an unauthorized deletion. I don't see any difference, so I rather have one object. |
The key here is that end-users accidentally remove resources storing original job information. |
I don't see accidental removal as a real problem, the chances of it to happen is the same as accidental removal of the job, however with a different resource a queue administrator could make sure that "end-users accidentally remove resources storing original job information" by RBAC. During the review of the original implementation, I think, workload deletions was presented as a valid way to requeue. |
Right... that's an easy way of evicting a workload and put it in the front of the queue. The alternative would be that the administrator only deletes the A finalizer would still be a more perfomant solution than having a second object. The question would be which controller removes the finalizer? |
Does that mean we say the note in the document? |
yes |
Agree. Additionally, we might want to consider another way to re-enqueue the job. |
SGTM, the ROI is high 😄 |
Does it make sense to consider a knob in Kueue configuration whether to store the annotation? Some users wouldn't be concerned about Job size (reasons may vary: 1. using non-indexed jobs, 2. using small node selectors, or using indexed jobs with small parameters), but may be concerned about losing track of node selectors. |
I think the motivation here is avoid to use too many annotations from the POV of kueue. If we have the ambition to push this to the upstream, it will be a stumbling stock. 🥲 |
I've opened a new PR for this #834, It's still in draft since is developed on top of #771. More interesting for this discussion are:
Please have a look and let me know what you think. |
/assign |
I prefer we don't maintain such piece of code. We are also risking that the annotation changes name/contents from one version to the next, as we do more mutations during admission. |
What would you like to be added:
When we want to suspend Job, we'd like to restore the original nodeAffinity, but some times we can't find the derived workload, see
kueue/pkg/controller/workload/job/job_controller.go
Lines 397 to 402 in 045697c
I'd like to add the nodeAffinity to Job annotations to make this an accurate one. It would like:
Why is this needed:
Always make sure that when suspending a Job, we'll restore the original nodeAffinity.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: