-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Workload partitioning API enhancement #802
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/priority important-soon |
|
||
### Goals | ||
|
||
This feature is about designing a read-only API that will describe the enabled workload partitions (types, classes, etc.). This information is needed for kubelet to start exposing the right resources as well for the admission webhook to know when the pod manipulation is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph would be good as the intro to the Motivation section on line 67.
|
||
This feature is about designing a read-only API that will describe the enabled workload partitions (types, classes, etc.). This information is needed for kubelet to start exposing the right resources as well for the admission webhook to know when the pod manipulation is needed. | ||
|
||
It is expected this API will be created at the installation process. Either manually or using the Performance addon operator render mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should work this sentence into the Proposal section.
# arbitrary name, all objects of this Kind should be processed and merged | ||
name: management-partition | ||
status: | ||
# list of strings, defines partition names to be exposed by kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is also used by the admission hook as a way to know when pods asking to be partitioned should be mutated.
Kubelet doesn't actually look at the list here, the PAO will use it to configure kubelet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, you are right.
The proposal is to define a new cluster-wide Custom Resource Definition that would describe the allowed partition names in the status section. That way it hints at being a read only object where no user/admin input or modifications are expected. | ||
|
||
```yaml | ||
apiVersion: workloadpartitioning.openshift.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original enhancement we use workload.openshift.io
in some annotations. Should we do that here, too, and make this something like partitioning.workload.openshift.io
What is standard/preferred for OpenShift APIs?
Alternatively, do we anticipate other API inputs related to workloads that might live on this CR later, so we should not include "partition" in the name?
name: management-partition | ||
status: | ||
# list of strings, defines partition names to be exposed by kubelet | ||
globalPartitionNames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bike shed: We should be consistent and either call it "global" or "cluster" but not use those two terms interchangeably.
|
||
Similar to the `Drawbacks` section the `Alternatives` section is used to | ||
highlight and record other possible approaches to delivering the value proposed | ||
by an enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere we've discussed the possible need to add per-workload parameters, like scope. The proposal above implies those would end up as separate lists, which I think is fine. We should include the other form here in the Alternatives section, for completeness. Our notes doc has an example as
workloadTypes:
- name: user-defined-workload-type
scope: Pool
- name: management
Scope: ClusterWide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
- management | ||
``` | ||
|
||
To allow for future extensibility and possible multiple sources of workload partition names (coming from customers, the installer, or other operators, etc.), we propose that there might be multiple `WorkloadPartitions` objects injected into the cluster. The expected behavior is that all components would just merge all the defined names together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places where we have cluster-scoped configuration resources like this we look for 1 well-defined name, often cluster
. I'm not sure how we decide between 1 and many, though. Maybe in this case many makes sense if we assume there may be multiple creators but that the CRD only has status fields. Do we really anticipate multiple creators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow custom partitions in the future then I would say yes, there might be multiple owners in such case (customer creates one and PAO renders another one). I just did not want to close the door prematurely.
1) An admin creates a WorkloadPartitions object manually after the cluster has been running for some time on a cluster that already has partitioning enabled | ||
1) An admin creates the WorkloadPartitions object manually after the cluster has been running for some time on a cluster with no workload partitioning enabled | ||
1) An admin deletes the WorkloadPartitions object that was created during the install process | ||
1) A random user manages to create a WorkloadPartitions object due to a bug in the defined RBAC rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is a good start. We should add some detail about what ill effects might result from each of these cases.
|
||
## Proposal | ||
|
||
The proposal is to define a new cluster-wide Custom Resource Definition that would describe the allowed partition names in the status section. That way it hints at being a read only object where no user/admin input or modifications are expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear from the proposal who should own this CRD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's explained a little better in #753 but we should probably summarize it here. The CRD will be defined in openshift/api
.
Signed-off-by: Martin Sivák <msivak@redhat.com>
|
||
To allow for future extensibility and possible multiple sources of workload partition names (coming from customers, the installer, or other operators, etc.), we propose that there might be multiple `WorkloadPartitions` objects injected into the cluster. The expected behavior is that all components would just merge all the defined names together. | ||
|
||
There is no controller or reconcile loop as part of this proposal. Only the cluster administrator will have the ability to create or manipulate the WorkloadPartitions objects. Anyone will be allowed to read them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still will run reconciliation under the PAO once the WorkloadPartitions
resource updated.
|
||
### Risks and Mitigations | ||
|
||
1) An admin creates a WorkloadPartitions object manually after the cluster has been running for some time on a cluster that already has partitioning enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like something happened with ordering
# List of strings, defines partition names that will be recognized by the | ||
# workload partitioning webhook. This list will also inform PAO about partitions | ||
# that should be configured on the kubelet and CRI-O level. | ||
clusterPartitionNames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm what about CPUs that should be used for the CPU pinning or will it use reservedCPUs by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API server does not care. Kubelet / CRIO will get the right ids from PAO.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is part of a bigger Workload partitioning enhancement that deals with the trigger API specifics.
Still heavy WIP.
Signed-off-by: Martin Sivák msivak@redhat.com