Multi-Level Scheduling Proposal #1128

lukasfrank · 2024-09-30T08:48:00Z

Proposed Changes

This proposal discusses possible solutions to improve the IronCore scheduling

balpert89 · 2024-09-30T11:19:47Z

+1 for the proposal. This addresses a lot of current pain points in the stack:

better utilization for dedicated machine pools
a transparent decision making process how a Machine resource ended up on a given MachinePool
adding custom resources for scheduling decisions such as EPC Memory

Going into more detail. The current proposed approach follows a decentralized solution whereas the centralized one (in the alternatives section) follows the network stack solution. There are some disadvantages with a clustered solution, such as lack of guaranteed availability. For networking this is okay because if a critical infrastructure component is down, networking is affected anyway. The scheduling part should not have such a big impact for computing - using the decentralized approach would isolate impact on a pool level.
Another aspect with a centralized solution is you have to deal with a lot of "boilerplate" challenges such as "eventual consistency", giving room for possible race conditions. The Reservations solution solves this pretty elegantly because you can introduce a time peroid until a scheduler waits for its decision and only takes the pools into consideration it finds in the status slice. Therefore, my vote goes with a decentralized approach here.

On the topic of the "scheduling decision". The reservation system is meant to have a decision who can provide the requested resources. Another controller can then use this to actually decide which one of those pools to actually use for the the Machine resource. This enables a similar behavior compared with Node <-> Pod scheduling in vanilla Kubernetes. Another point to consider is that you can enable "system" reservations to accomodate for resources that are exclusively reserved for system applications.

Some aspects that are unclear for me:

who decides the rating on a given status entry, is this similar to a priority? How does this influence the scheduling decision?
how will arbitrary resources be announced, such as the already mentioned EPC Memory? Or e.g. dedicated graphics cards?

lukasfrank · 2024-09-30T11:36:26Z

who decides the rating on a given status entry, is this similar to a priority? How does this influence the scheduling decision?

Only the pool provider can calculates the rating (since it's the component to check if the reservation can be fulfilled) and it is a metric on "how good the Reservation fits onto the related pool". It should be understood as a hint for the scheduler to take the decision.

how will arbitrary resources be announced, such as the already mentioned EPC Memory? Or e.g. dedicated graphics cards?

In the distributed approach: There is no need anymore for announcing resources. The resource "owner" (the pool provider) is in charge of taking or rejecting the reservation and needs to keep track of all the resources. If arbitrary resources aren't available on a specific host, the reservation will be declined.

@balpert89 Does that make sense to you?

balpert89 · 2024-09-30T12:07:38Z

The rating part is clear for me now, thanks for addressing.

In the distributed approach: There is no need anymore for announcing resources. The resource "owner" (the pool provider) is in charge of taking or rejecting the reservation and needs to keep track of all the resources. If arbitrary resources aren't available on a specific host, the reservation will be declined.

Does that mean we will deprecate the allocatable / available (https://github.com/ironcore-dev/ironcore/blob/main/api/compute/v1alpha1/machinepool_types.go#L30-L33) fields as they are not required anymore?

lukasfrank · 2024-09-30T13:14:40Z

The rating part is clear for me now, thanks for addressing.

In the distributed approach: There is no need anymore for announcing resources. The resource "owner" (the pool provider) is in charge of taking or rejecting the reservation and needs to keep track of all the resources. If arbitrary resources aren't available on a specific host, the reservation will be declined.

Does that mean we will deprecate the allocatable / available (https://github.com/ironcore-dev/ironcore/blob/main/api/compute/v1alpha1/machinepool_types.go#L30-L33) fields as they are not required anymore?

Correct, there would be no need for this fields anymore. In case if it's used to aggregate the resources of the entire infrastructure, we can offer metrics and aggregate it the kubernetes way.

Draft proposal to improve scheduling

b4c140d

github-actions bot added documentation Improvements or additions to documentation size/L labels Sep 30, 2024

lukasfrank changed the title ~~Proposal to improve scheduling~~ Multi-Level Scheduling Proposal Sep 30, 2024

lukasfrank mentioned this pull request Oct 14, 2024

Limit visibility of <Pool>.Status.Capacity / Allocatable #813

Closed

Added iri protobuf example

c9852d7

lukasfrank mentioned this pull request Nov 25, 2024

Added Reservation API, poollet and broker implementation #1172

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Level Scheduling Proposal #1128

Multi-Level Scheduling Proposal #1128

lukasfrank commented Sep 30, 2024 •

edited

Loading

balpert89 commented Sep 30, 2024

lukasfrank commented Sep 30, 2024 •

edited

Loading

balpert89 commented Sep 30, 2024

lukasfrank commented Sep 30, 2024

Multi-Level Scheduling Proposal #1128

Are you sure you want to change the base?

Multi-Level Scheduling Proposal #1128

Conversation

lukasfrank commented Sep 30, 2024 • edited Loading

Proposed Changes

balpert89 commented Sep 30, 2024

lukasfrank commented Sep 30, 2024 • edited Loading

balpert89 commented Sep 30, 2024

lukasfrank commented Sep 30, 2024

lukasfrank commented Sep 30, 2024 •

edited

Loading

lukasfrank commented Sep 30, 2024 •

edited

Loading