Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Workload partitioning API enhancement #802

Closed
wants to merge 4 commits into from

Conversation

MarSik
Copy link
Contributor

@MarSik MarSik commented Jun 4, 2021

This is part of a bigger Workload partitioning enhancement that deals with the trigger API specifics.

Still heavy WIP.

Signed-off-by: Martin Sivák msivak@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2021
@openshift-ci openshift-ci bot requested review from abhinavdahiya and staebler June 4, 2021 09:06
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 4, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign derekwaynecarr after the PR has been reviewed.
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dhellmann
Copy link
Contributor

/priority important-soon

@openshift-ci openshift-ci bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 4, 2021

### Goals

This feature is about designing a read-only API that will describe the enabled workload partitions (types, classes, etc.). This information is needed for kubelet to start exposing the right resources as well for the admission webhook to know when the pod manipulation is needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph would be good as the intro to the Motivation section on line 67.


This feature is about designing a read-only API that will describe the enabled workload partitions (types, classes, etc.). This information is needed for kubelet to start exposing the right resources as well for the admission webhook to know when the pod manipulation is needed.

It is expected this API will be created at the installation process. Either manually or using the Performance addon operator render mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should work this sentence into the Proposal section.

# arbitrary name, all objects of this Kind should be processed and merged
name: management-partition
status:
# list of strings, defines partition names to be exposed by kubelet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is also used by the admission hook as a way to know when pods asking to be partitioned should be mutated.

Kubelet doesn't actually look at the list here, the PAO will use it to configure kubelet, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, you are right.

The proposal is to define a new cluster-wide Custom Resource Definition that would describe the allowed partition names in the status section. That way it hints at being a read only object where no user/admin input or modifications are expected.

```yaml
apiVersion: workloadpartitioning.openshift.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original enhancement we use workload.openshift.io in some annotations. Should we do that here, too, and make this something like partitioning.workload.openshift.io What is standard/preferred for OpenShift APIs?

Alternatively, do we anticipate other API inputs related to workloads that might live on this CR later, so we should not include "partition" in the name?

name: management-partition
status:
# list of strings, defines partition names to be exposed by kubelet
globalPartitionNames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike shed: We should be consistent and either call it "global" or "cluster" but not use those two terms interchangeably.


Similar to the `Drawbacks` section the `Alternatives` section is used to
highlight and record other possible approaches to delivering the value proposed
by an enhancement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we've discussed the possible need to add per-workload parameters, like scope. The proposal above implies those would end up as separate lists, which I think is fine. We should include the other form here in the Alternatives section, for completeness. Our notes doc has an example as

workloadTypes:
    - name: user-defined-workload-type
      scope: Pool
    - name: management
      Scope: ClusterWide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

- management
```

To allow for future extensibility and possible multiple sources of workload partition names (coming from customers, the installer, or other operators, etc.), we propose that there might be multiple `WorkloadPartitions` objects injected into the cluster. The expected behavior is that all components would just merge all the defined names together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places where we have cluster-scoped configuration resources like this we look for 1 well-defined name, often cluster. I'm not sure how we decide between 1 and many, though. Maybe in this case many makes sense if we assume there may be multiple creators but that the CRD only has status fields. Do we really anticipate multiple creators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow custom partitions in the future then I would say yes, there might be multiple owners in such case (customer creates one and PAO renders another one). I just did not want to close the door prematurely.

1) An admin creates a WorkloadPartitions object manually after the cluster has been running for some time on a cluster that already has partitioning enabled
1) An admin creates the WorkloadPartitions object manually after the cluster has been running for some time on a cluster with no workload partitioning enabled
1) An admin deletes the WorkloadPartitions object that was created during the install process
1) A random user manages to create a WorkloadPartitions object due to a bug in the defined RBAC rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is a good start. We should add some detail about what ill effects might result from each of these cases.


## Proposal

The proposal is to define a new cluster-wide Custom Resource Definition that would describe the allowed partition names in the status section. That way it hints at being a read only object where no user/admin input or modifications are expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear from the proposal who should own this CRD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's explained a little better in #753 but we should probably summarize it here. The CRD will be defined in openshift/api.


To allow for future extensibility and possible multiple sources of workload partition names (coming from customers, the installer, or other operators, etc.), we propose that there might be multiple `WorkloadPartitions` objects injected into the cluster. The expected behavior is that all components would just merge all the defined names together.

There is no controller or reconcile loop as part of this proposal. Only the cluster administrator will have the ability to create or manipulate the WorkloadPartitions objects. Anyone will be allowed to read them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still will run reconciliation under the PAO once the WorkloadPartitions resource updated.


### Risks and Mitigations

1) An admin creates a WorkloadPartitions object manually after the cluster has been running for some time on a cluster that already has partitioning enabled

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like something happened with ordering

# List of strings, defines partition names that will be recognized by the
# workload partitioning webhook. This list will also inform PAO about partitions
# that should be configured on the kubelet and CRI-O level.
clusterPartitionNames:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm what about CPUs that should be used for the CPU pinning or will it use reservedCPUs by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API server does not care. Kubelet / CRIO will get the right ids from PAO.

@dhellmann dhellmann removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 30, 2021
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 30d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2021
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2021
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Oct 5, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants