-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2978] Fix handling of reserved allocations where node differs #996
Conversation
YUNIKORN-2700 introduced a bug where allocations of previously-reserved tasks were not handled correctly in the case where we schedule on a different node than the reservation. Ensure that we unreserve and allocate using the proper node in both cases. Also introduce additional logging of allocations on nodes to make finding issues like this easier in the future.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #996 +/- ##
=======================================
Coverage 81.34% 81.34%
=======================================
Files 97 97
Lines 15590 15620 +30
=======================================
+ Hits 12681 12706 +25
- Misses 2630 2634 +4
- Partials 279 280 +1 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have at least a single unit test that fails with the old code and passes with this PR?
I think the missing bit is just a single line:
We need a unit tests, and it should be doable to create one:
Before the fix the allocation will show the reserved node ID or none at all. |
Addressed review comments. Reservation test updated to verify node assignment -- verified this test fails prior to this PR but passes now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…rs (#996) YUNIKORN-2700 introduced a bug where allocations of previously-reserved tasks were not handled correctly in the case where we schedule on a different node than the reservation. Ensure that we unreserve and allocate using the proper node in both cases. Also introduce additional logging of allocations on nodes to make finding issues like this easier in the future. Closes: #996
What is this PR for?
YUNIKORN-2700 introduced a bug where allocations of previously-reserved tasks were not handled correctly in the case where we schedule on a different node than the reservation. Ensure that we unreserve and allocate using the proper node in both cases.
Also introduce additional logging of allocations on nodes to make finding issues like this easier in the future.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-2978
How should this be tested?
Verified successful processing of 1000-pod job on autoscaled cluster where previously this would fail.
Screenshots (if appropriate)
Questions: