Replies: 20 comments 12 replies
-
Separately, I experimented with sending AdmissionCheck to be "retry" when the check didn't pass. In such case, workloads created after it are unblocked, and able to run. However, the first workload was failed and evicted without getting retry. The workload details are in the following: Status: Normal QuotaReserved 24s kueue-admission Quota reserved in ClusterQueue cluster-queue, wait time since queued was 1s |
Beta Was this translation helpful? Give feedback.
-
I see, the culprit seems to be that the
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
In addition regarding the bug and fix: if the fix is to reset admission check to pending in the controller after eviction, does it address my initial question? Quote here: We expect resources (GPU/Memory, etc.) are not allocated until all admission checks pass. How can this addressed? |
Beta Was this translation helpful? Give feedback.
-
I can try again with the latest release v0.9 when it's out. |
Beta Was this translation helpful? Give feedback.
-
@tenzen-y @mimowo I tried with the latest release v0.8.2 that includes the workload change, however, same issues are still observed:
Status: Normal Pending 9m14s kueue-admission couldn't assign flavors to pod set master: insufficient unused quota for cpu in flavor default-flavor, 200m more needed, insufficient unused quota for memory in flavor default-flavor, 176Mi more needed |
Beta Was this translation helpful? Give feedback.
-
@leipanhz it is a bit hard to tell what is going on because this is a custom AC.
IIUC your explanation we don't observe this behavior in the built-in ACs like ProvReq - for example in this integration test we check can be set to Ready based on the ProvReq status. I suspect maybe your custom AC controller sets the check to |
Beta Was this translation helpful? Give feedback.
-
The retry is set based on the following:
It is possible the controller will set retry more than once for a single workload, would that be in issue? |
Beta Was this translation helpful? Give feedback.
-
@mimowo In addition, could you please clarify the following: Admission Checks: |
Beta Was this translation helpful? Give feedback.
-
I suspect this is the case here - that your AC goes into a loop of: so that on the consecutive runs is still sets Retry (but this is a guess). It would be best if you inspect the logs or use debugger.
I see, but this is currently WAI, not a bug. It would require a new feature extension to allow that.
So, it is important to make sure phase 2 is not too long so that (as you say) they resources don't stay unused for too long. |
Beta Was this translation helpful? Give feedback.
-
@mimowo I'd like to request a feature not to reserve resources until all conditions are met (admission checks, resource reservation). This is the main motivation we are integrating with Kueue for resource efficiency. It's likely other applications have such needs too. In the meantime, do you know if there is any workaround can be applied? Thank you |
Beta Was this translation helpful? Give feedback.
-
Can you explain higher level what is your use case, and what the external AC is for? The proposed workaround may depend on that. |
Beta Was this translation helpful? Give feedback.
-
I see, could you use two jobs / workloads for that? One job represents the pre-task, and another the main task. You need to watch the pre-task, once finished you create the main job? |
Beta Was this translation helpful? Give feedback.
-
I see, good point. Yes, I meant two Jobs. Still, the pre-task job could be created by your in-house code (for example in a webhook). Then, the webhook would need to also temporarily remove the "queue-name" label so that it is not scheduled by Kueue. Once the pre-task is finished you could add-back the "queue-name" label (probably another controller) which would trigger scheduling by Kueue. This is many steps, I know, but you asked for a workaround :). |
Beta Was this translation helpful? Give feedback.
-
We don't have a retry config for all ACs yet, this is yet another feature request, or you could do it yourself in your AC controller. |
Beta Was this translation helpful? Give feedback.
-
Hi @leipanhz
There is no limit on how many retries a workload can do. We implemented a cap on admissioncheck controller side to on setting I believe what mimowo meant in this comment:
is that your admission check controller retries the check over and over in the infinite loop.
First |
Beta Was this translation helpful? Give feedback.
-
Also an important consideration is whether the pre-task you mention run any pods? Does it keep running it even after retry? If so, then this may impact holding reserved quota even after AdmissionCheck was set to Retry and the Workload was evicted |
Beta Was this translation helpful? Give feedback.
-
Hi @mimowo @PBundyra: I tested AdmissionCheck Retry status with the following scenario with 3 jobs: Job1, 2, 3 are submitted sequentially; cluster can only handle one job at a time.
Desired result: Observed:
In the controller, I set up requeue after 5 seconds if ACstatus is retry, but requeue didn't happen. Is this expected behavior on Retry? If not, how can we set the status back to ready when all conditins are met? |
Beta Was this translation helpful? Give feedback.
-
This is surprising that it is stuck in Retry after the recent fix which should set it back to Pending. What is the version you are using? 0.8.3+ and 0.9.0+ contain the fix. |
Beta Was this translation helpful? Give feedback.
-
Quick summary: @mimowo and I spent long hours to debug this issue. It turns out that the culprit is if a workload is evicted, setting admission check to Retry somehow triggers a bug. It causes this workload to be stuck and never able to reserve quota. As a result, the job is stuck in suspended mode forever. While root cause is still to be investigated, a workaround is to put guard in the reconciler such as this: kueue/pkg/controller/admissionchecks/provisioning/controller.go Lines 122 to 124 in c655645 |
Beta Was this translation helpful? Give feedback.
-
To admit a workload, Kueue scheduler checks quota reservation and all admission checks. Only when all conditions are met, the workload is admitted. In my experiment with a custom admission controller, when admission check is not passed (under pending status): although workload is not admitted, it is still holding CPU/GPU/Memory resources. This is not desired as we want the resource to be allocated for other jobs during admission pending stage.
Any comments on how to address this issue?
Beta Was this translation helpful? Give feedback.
All reactions