-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPNODE-725] Control Group v2 Enablement on New Clusters #939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments
Thank you everyone for the great comments. I've updated the doc with your comments and to use the Infrastructure API. It feels like a good spot for the cgroup setting. |
|
||
Migrating to cgroup v2 will bring in many new features and fixes not found in | ||
cgroup v1. cgroup v1 is considered 'legacy' and migrating to cgroup v2 is | ||
considered necessary since RHEL ships with cgroup v2 on by default. (OpenShift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/RHEL/RHEL9/
(Also of note related to this, current Fedora defaults to v2, and I believe OKD is still reverting that - hopefully we can also validate this flag on OKD)
When we're thinking about RHEL9 - would one expect that upgrading to that automatically inherits the OS default to cgroup v2, or would the admin need to flip on this flag in addition? I would hope the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to enable this by flag at install time.
|
||
## Proposal | ||
|
||
The option to enable cgroup v2 will have to reside in a centralized location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally see your desire here to make the cgroup setting a cluster wide thing.
But so far this enhancement is ignoring the BYO-RHEL 7|8 case where the cgroup setting is not managed by the cluster. Or conversely, it seems to mostly be assuming RHCOS8.
It also seems to me that we really actually do want the ability to have a custom worker pool with cgroup v2 even in older releases - so that admins of existing clusters can test out workloads in cgroup v2 before making a cluster wide switch. Right?
Also to flip this around, even ignoring BYO-RHEL, upgrades of existing clusters that are purely RHCOS8 will have at least a period of time during the upgrade where the nodes are mixed.
What are the major problems you see with a mixed environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat tangential to this but I want to emphasize, the incoming RHEL9 will mean that bare term "RHCOS" becomes a weak identifier, and we will also commonly need to say e.g. RHCOS8 or RHCOS9 or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a huge need of mixed mode outside of maybe hypershift if cgroups v2 works fine as we would have to invest in testing it vs. testing pure cgroups v2 more thoroughly and address edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified in the doc a pure mode cgroup v2 environment. Metric reporting may be different between cgroup v1 and cgroup v2 environments. We will need more testing around HPA, VPA, and other controller workloads in mixed mode environments to make sure they understand the metrics. The goal of this enhancement is to enable a pure mode cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not advocating specifically to emphasize this API at a granular level.
I'm more saying three things:
- The hard reality is that clusters will be mixed during upgrades, at least for some period of time, so it needs to at least not actively fail
- We know there are people out there that are e.g. pausing the
worker
pool for long periods of time, so again those situations will be mixed. Unless we try to actively block updates based on this (in the same way there's some MCO work to block upgrades on having too-old workers) - Internally in the MCO, pools roll out machineconfig. Expressing this as a machineconfig feels natural.
That said, the more we add things like this, the more need I see for making it easy for a MachineConfig
object to apply to all node types.
But today higher level flags in the install config like hyperThreading
and fips
are represented as a generated MachineConfig
for master
and worker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @cgwalters brings up really good points, but seems like this enhancement is targetting new clusters with cgroupsv2 enabled on all pools as a first step to bringing this into OCP. I'd imagine we'd need to hammer out these details via another mixed/heterogenous deployment/day2 enhancement?
Are there any restrictions/specific callouts we'd want to bring in from Colin's points above on the scope of which new clusters would be eligible for using this at install?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, we do not have any restrictions on which clusters cgroup v2 can be enabled on. We will need to gather bug reports and feedback on what is working (and not working).
A follow up question on the previous (now resolved) comment: is this mutable after installation? If so (assuming we are allowing migration between v1 to v2) are you able to also go from v2 back to v1 or is the migration unidirectional? |
``` | ||
|
||
### User Stories | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't quite user stories and are more to-dos
|
||
### Non-Goals | ||
|
||
- Support mixed cgroups modes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, maybe hypershift may want to run control planes and worker nodes differently.
Updated the enhancement to clarify this is for new clusters and added a new API. |
|
||
#### Tech Preview -> GA | ||
|
||
With sufficient internal testing and customer feedback the feature will graduate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jfyi new OKD 4.9 clusters will have cgroupsv2 enabled by default
I believe this enhancement is ready for a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few last comments :)
Since this enhancement impacts MCO as well, wanted to know in which OpenShift version we are expecting this to be get implemented(dev preview)? |
@sinnykumari @kikisdeliveryservice thank you. updated the PR and mentioned we can target 4.10 for the changes. |
umm, not sure but shouldn't a enhancement be merged before the release planning of a particular release? |
@sinnykumari that does make sense to me... removed it from the doc. |
@sttts could you take a pass through this? |
to Tech Preview. | ||
|
||
Upon graduation to GA the feature will still be turned off by default, but may | ||
be enabled within OpenShift at a later time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you expect a period where it is on by default, but can be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this sentence in f69f9c7 to address that another enhancement will be created to specify how the feature will be turned on by default. The original sentence here was not clear with my other modifications to the document.
} | ||
|
||
type NodeSpec struct { | ||
CgroupMode CgroupMode `json:"cgroupMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field makes sense to me. In the actual api PR, you'll need to fill in godoc for users.
} | ||
|
||
type NodeSpec struct { | ||
CgroupMode CgroupMode `json:"cgroupMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure it stays at a string? Or should we make it a struct from the beginning?
cgroup:
mode: v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other parts of the API do the same thing... Making it a struct seems overkill for something that is likely not going to change again for a long time.
CgroupMode_Default = CgroupMode_v1 | ||
) | ||
|
||
type Node struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts you devoted some thought to the top level object name, does this match your preference? I have not devoted enough think-time to have a clear opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with NodeOptions... someone mentioned on this thread Node makes more sense. I agree with simplifying it to Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will the latency settings (in some other enhancement) go here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latency is a multi component api addition. I do not believe it goes in this object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only about kubelet, and others to cope with the fallout. Seem to be very Node'ish.
|
|
|
||
### Version Skew Strategy | ||
|
||
A cluster installed with cgroup v2 would not be impacted by the version skew |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will be when "Add blocking cgroup v2 upgrade jobs" is done for your GA criteria. You plan to revisit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
||
### Non-Goals | ||
|
||
Mixed mode cgroup modes are not 100% compatible with each other. We need data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this precludes migrating from cgroups v1 to cgroups v2. Can you explode that into a top level "non-goal" so we know it this enhancement will need revisiting to remove cgroups v1.
The flow and GA criteria lgtm. Needs a squash /approve holding for squash. feel free to release after. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
OCPNODE-725
This enhancement describes enabling cgroup v2 within OpenShift.