-
Notifications
You must be signed in to change notification settings - Fork 7
Reservation-Based Scheduling Proposal for Machine
s
#1128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
b4c140d
Draft proposal to improve scheduling
lukasfrank c9852d7
Added iri protobuf example
lukasfrank 9ba01d4
Refine proposal
lukasfrank 78571aa
PR Review
lukasfrank 76ac4b3
Merge branch 'main' into feat/scheduling
lukasfrank e75fd34
PR Review
lukasfrank 5a4d387
PR Review
lukasfrank a4b307f
PR Review
lukasfrank 85b9737
PR Review
lukasfrank File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: IEP Title | ||
title: Internal Load Balancers | ||
|
||
iep-number: 8 | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
--- | ||
title: Reservation-Based Scheduling for Machines | ||
|
||
iep-number: 11 | ||
|
||
creation-date: 2024-09-30 | ||
|
||
status: implementable | ||
|
||
authors: | ||
|
||
- "@lukasfrank" | ||
|
||
|
||
reviewers: | ||
|
||
- "@afritzler" | ||
- "@balpert89" | ||
|
||
--- | ||
|
||
# IEP-11: Reservation-Based Scheduling for `Machine`s | ||
|
||
## Table of Contents | ||
|
||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [Alternatives](#alternatives) | ||
|
||
## Summary | ||
Scheduling resources is a central process to increase utilization of cloud infrastructure. It not only involves the | ||
process to find any feasible `MachinePool` but rather, if possible, to find a `MachinePool` where the `Machine` can | ||
be materialized. | ||
|
||
|
||
## Motivation | ||
|
||
With `ironcore` a multi-level hierarchy can be built to design availability/failure zones. `Machine`s can be created | ||
on every layer, which are subsequently scheduled on `MachinePool`s in the hierarchy below. A scheduler on every | ||
layer should assign a `Machine` on a `MachinePool` which can fulfill it. This proposal addresses the known | ||
limitations of the current implementation: | ||
- API changes needed to extend the scheduling if further resources should be considered | ||
- Available resources are attached to `Pools` and aggregated. In higher levels a correct decision can't be made | ||
which means that `Machine`s are assigned to saturated `Pool`s. | ||
|
||
### Goals | ||
|
||
- Dynamic (`NetworkInterfaces`, `LocalDiskStorage`) and static resources (resources defined by a `Class` like `CPU`, | ||
`Memory`) should be taken into the scheduling decision. | ||
- Scheduling should be robust against data race conditions | ||
- `Machine`s on every level should be scheduled on a `Pool` | ||
- It should be possible to add resources influencing the scheduling decision without API change | ||
- Resource updates of `Pool`s should be possible | ||
- Adding/removing of compute `Pool`s should be possible | ||
|
||
### Non-Goals | ||
- The scheduler should not act on `Machine` updates like `NetworkInterface` attachments | ||
lukasfrank marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- This proposal does not cover the scheduling of `Volumes`. | ||
|
||
## Proposal | ||
|
||
For every created `Machine` with an empty `spec.machinePoolRef`, the scheduler will create a resource `Reservation`. | ||
The `poollet` will continue to broker only `Machine`s with a `.spec.machinePoolRef` set. The `Reservation` is a | ||
condensed derivation of a `Machine` containing the requested resources like `NetworkInterfaces` and the attached | ||
`MachineClass` resource list. The `IRI` will be extended to be able to manage the `Reservation`s. Once the | ||
`Reservation` hits a pool provider (e.g. `libvirt-provider`), the decision can be made if the `Reservation` can be | ||
fulfilled or not. In case of accepting the `Reservation`, resources needs to be blocked until: | ||
1. the corresponding `Machine` is being placed on the pool provider. | ||
2. the reservation is being deleted | ||
|
||
The status of the `Reservation` is being propagated up to the layer where it was created. As soon as the root | ||
balpert89 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
reservation has a populated status which contains a list of possible pools, the scheduler can pick one, set the | ||
`Machine.spec.machinePoolRef` and the `poollet` will broker the `Machine` to the next level. | ||
|
||
`Reservation` resource: | ||
``` | ||
apiVersion: core.ironcore.dev/v1alpha1 | ||
kind: Reservation | ||
metadata: | ||
namespace: default | ||
name: reservation | ||
spec: | ||
# Pools define each pool where resources should be reserved | ||
pools: | ||
lukasfrank marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- poolA | ||
- poolB | ||
- poolC | ||
# Resources defines the requested resources in a ResourceList | ||
resources: | ||
localStorage: 10G | ||
nics: 2 | ||
cpu: 2 | ||
status: | ||
# Pools describe where the resource reservation was possible | ||
pools: | ||
- name: poolA | ||
state: Accepted | ||
- name: poolB | ||
state: Rejected | ||
``` | ||
|
||
`Reservation` states: | ||
- **Accepted**: The `Reservation` can be materialized on this `pool`. | ||
- **Rejected**: The `Reservation` can *not* be materialized on this `pool`. | ||
- **Bound**: The corresponding `Machine` replaced the `Reservation` which indicates that the `Reservation` can be | ||
cleaned up. | ||
|
||
Added `IRI` methods: | ||
|
||
```protobuf | ||
rpc ListReservations(ListReservationsRequest) returns (ListReservationsResponse) {}; | ||
rpc CreateReservation(CreateReservationRequest) returns (CreateReservationResponse) {}; | ||
rpc DeleteReservation(DeleteReservationRequest) returns (DeleteReservationResponse) {}; | ||
``` | ||
|
||
|
||
### Detailed flow | ||
|
||
The flow to assign a `Machine` to a `Pool` consists of 3 Phases: | ||
|
||
#### 1. Reservation Flow | ||
1. If `.spec.machinePoolRef` is not set, the scheduler creates a `Reservation` that includes the required resources. | ||
2. `poollet`s which match `.spec.pools` of `Reservation` pick it up and broker it one layer down. If the `.spec. | ||
pools` is empty, it is considered as a wildcard for all pools. | ||
3. `provider` evaluates the `Reservation` and sets its state to `Accepted` or `Rejected`, which is then | ||
propagated up the hierarchy. | ||
|
||
#### 2. Scheduling Flow | ||
1. On every layer, the scheduler uses a `Reservation` corresponding to a `Machine` to select a `MachinePool`. It | ||
will pick one of the `Accepted` pools of the `Reservation.status.pools` and updates the `Machine.spec. | ||
machinePoolRef`. | ||
2. `poollet` picks up the `Machine` since `.spec.machinePoolRef` is set | ||
3. Once a `Machine` reaches a `provider` with a relating `Reservation`, the `Reservation` will be replaced through | ||
the `Machine`. `Reservation` status will be updated to `Bound` and `poollet`s pulls the status up until to the | ||
root `Reservation`. | ||
|
||
#### 3. Cleanup Flow | ||
1. The scheduler deletes root `Reservation` if `Machine` has `.spec.machinePoolRef` and if `Reservation` has a | ||
`Bound` state. | ||
lukasfrank marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Advantages | ||
- In case of a partial outage/network issues `Machine`s can be placed on other `Pool`s | ||
- `Pool`s do not leak resource information, owner of resources decides if `Reservation` can be fulfilled (less complexity for over provisioning) | ||
- `Reservation`s can be used to block resources without creating a `Machine` | ||
|
||
### Disadvantages | ||
- Increases complexity in the scheduling by introducing a new resource (Reservation) and the scheduler flow has to be extended. | ||
|
||
## Alternatives | ||
|
||
Another solution: The providers (in the lowest layer) announce there resources to a central cluster. In these clusters a `shallow` pool represent the actual compute pool. In fact the problem of scheduling across multi-levels is transformed in a one-layer scheduling decision. Pools between the root cluster and the leaf clusters are announced but only to represent the hierarchy and not for the actual scheduling. | ||
|
||
If a `Machine` is created, a controller creates a related `Machine` in the central cluster where the `Pool` is being picked. The `Pool` is synced back and the`.spec.machinePoolRef` being set. The `poollet` syncs the `Machine` one layer down. In every subsequent layer the path is being looked up in the central cluster until the provider is hit. | ||
|
||
### Advantages | ||
- Scheduling in central place is simpler | ||
- Availability/failure zones can be modeled in central cluster | ||
- (Scheduling decision takes potentially less time?) | ||
|
||
### Disadvantages | ||
lukasfrank marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Resources are populated to central place and consistency needs to be guaranteed | ||
- Updates are harder (failure of provider, overbooking) | ||
- Bookkeeping of resources needs to happen twice: in provider and central place | ||
- Layered structure of hierarchy needs to be duplicated at central place | ||
- If central cluster is not reachable, no `Machine` can be placed | ||
- No easy way to dynamically change pool hierarchy. | ||
- Leaf `poollet` configuration/behaviour will differ from other `poollet`s |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.