-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eus-upgrades-mvp: don't enforce skew check in MCO #762
eus-upgrades-mvp: don't enforce skew check in MCO #762
Conversation
This reads like a good summary of what was discussed. FWIW, I buy the argument that we're first and foremost asserting that it is not safe to upgrade the API server to a new kubernetes minor version, and that the admin needs to reduce skew to at least N-1 in order for it to be safe to update the API server. So the API server operator seems like the component to do that. The challenge that the API server and MCO proposal have in common AIUI, is "is it safe to upgrade" is based only on the cluster state and not the target cluster version - i.e. surely we don't want either operator asserting Upgradeable=False simply because we're at the skew limit, because that would prevent legimiate z-stream updates? |
The critical detail (that I always forget) is that |
Wow! Thanks! |
That would be a good detail to include in the text, as a reminder. As you say, it's not the obvious interpretation of that field. |
@markmc relating to current state versus target cluster state making the decision we have a long standing enhancement to allow target version definition of preflights which has run into a bit of a headwind. We do plan to make another attempt at pushing that in the future because we believe it to be more flexible and reduce friction imposed by backporting logic to older releases. PTAL if that interests you |
This looks fine to me as the author of the original proposal. I'll let those on teams affected by the change provide final approval. |
The markdown job is failing because this enhancement doesn't match the required template. These headings are missing from the upgrades section:
|
the `Upgradeable` flag, the MCO could provide this functionality. Either of | ||
these two operators are the obvious choice for such a check since they are | ||
responsible for both halves of the kubelet-API Server interaction. It makes | ||
more sense for the leading component to implement the check, however, since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a critical point for discussion. The two arguments in the other direction are locality to the resolving action and locality of skew policy knowledge.
locality to the resolving action is about where a user has to go to resolve an Upgradeable=False condition. The natural assumption we've seen from users and support in the past is that the operator with the unexpected condition is where to take a resolving action. Now in this particular case, the kube-apiserver will be the spot for a resolving action 0% of the time. The kube-apiserver will never be able to upgrade a kubelet to close the skew. The MCO will sometimes (often?) be the spot to close skew. Either by waiting for a rollout or by unpausing a machine-config pool. It's not perfect (byo-host for instance), but it is significantly more than the 0% of the kube-apiserver.
locality of skew policy knowledge is about which component knows its skew policy. It's worth remembering that even in a kube control plane, skew policy between kubelet, kube-controller-manager, kube-scheduler, SDN (and its various variants) are different. Expanding to everything in our payload (and what they subsequently install) it gets even larger. While it is possible for the kube-apiserver to be a rally point for every team enforcing their skew policy, the distributed nature of clusteroperator conditions means that knowledge can be codified close to the component with the skew policy knowledge. In this case, the MCO is the closest component (since we lack a kubelet operator).
In the general case, the locality to the resolving action and the locality to the skew policy knowledge aligns extremely well. In this specific case, they meet in the MCO.
I've taken the liberty of demonstrating in a PR openshift/machine-config-operator#2552
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy the first argument, but I'm struggling with "locality of skew policy knowledge". Between the kubelet and the API Server, it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API. This boils down to the same argument I've made in the PR: the leading component determines the acceptable skew. Based on that understanding, I would think the API Server Operator would be the place to at least store the skew policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API
This is the point: the kube-apiserver dictates the API versions served only, the clients know which API versions they need and depend on. For that reason in upstream the API-Machinery team is NOT the one responsible for testing and fixing the skew of a release, but node team is. The API-Machinery team just follows the known, mechanical procedures of introducing new API versions, deprecating and removing old ones.
This is what David means with "locality of skew policy knowledge": the API consumers know and not the API provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or in yet other words:
- locality to resolve skew problem: UX problem
- locality to enforce skew policy: ppl problem, who owns the skew policy (kube-apiserver does not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two sides: server and client. The client would be using the correct version of the API, but the server may have a backwards incompatible change where it does not serve the correct payload to the client. In turn, the client may have a backwards incompatible change, and not know what to do with a particular payload. So there needs to be two checks: one in the MCO and one within the API. Each component has a responsibility to check the skew, since either side could break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly disagree with the server having to know its clients. This is not how it works in any of the API life cycle policies we apply in Kubernetes and OpenShift. The server has O(100) clients in an OpenShift cluster. It cannot follow individual skew policies, and I don't see why we should introduce another API philosophy here.
a backwards incompatible change
This is disallowed for the server and hence never happens (if we are doing our job well, and nearly always did so in the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not asking the server to know its clients though. We're asking its Operator to know them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap the discussion we had on this for those that were not able to attend:
- The API server ships with a set of built-in types (pod and node).
- The upstream project via API review/human or project operator oversight ensures that skew between API server and kubelet is n-2.
- The API server operator is being asked to report Upgradeable=False to capture that API review/human/operational oversight because it is the component that must not upgrade.
- The Machine Config Operator is the component that must upgrade, so reporting Upgradeable=False is confusing.
- The Machine Config Operator should report a message in its reporting that admins should perform an action.
I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).
For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.
On top of that, the gating mechanism we have today is the `Upgradeable=False` | ||
flag, which indicates that a particular operator cannot be upgraded, thereby | ||
halting the upgrade of the entire cluster. It doesn't make sense for the MCO to | ||
assert this condition, since an upgrade of the MCO would actually reduce the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be the case, since the skew happened. Suggesting that something else is allowing the skew to develop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me adjust that wording. What I mean to say is that the MCO plus its operand...
assert this condition, since an upgrade of the MCO would actually reduce the | ||
skew. If the MCO were to use this mechanism to enforce the skew, it would be a | ||
reinterpretation of the function of that flag to instead indicate that the | ||
entire cluster cannot be upgraded. It's a subtle but important distinction that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested in how placing this condition on the kube-apiserver-operator is a better experience for users when compared with the locality benefits to users outlined https://github.com/openshift/enhancements/pull/762/files#r623192675 .
This changes the proposal to specify that the API Server Operator should instead be responsible for enforcing the kubelet version skew.
assert this condition, since an upgrade of the MCO and its operands (RHCOS) | ||
would actually reduce the skew. If the MCO were to use this mechanism to | ||
enforce the skew, it would be a reinterpretation of the function of that flag | ||
to instead indicate that the entire cluster cannot be upgraded. It's a subtle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking this vs. the API, and yeah, current godocs are:
Upgradeable indicates whether the operator is in a state that is safe to upgrade...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although what's being proposed here means that should actually be:
Upgradeable indicates whether the operator is safe to upgrade based on the current cluster state...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've floated openshift/api#926 with this update.
@rphillips As per our meeting on Thursday where representatives from all affected parties decided to go with this approach as a short-term compromise, does this PR need any more updates? Would be great to get this merged ASAP. |
@derekwaynecarr or @eparis are approvers. I'll ask them to make a pass. |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open a follow-on to this enhancement that states what the MCO must do.
It must recommend users upgrade a machine config pool per our prior discussion.
Since @crawford is not available to make that update, I will add that in a follow-up.
/approve
/lgtm
the `Upgradeable` flag, the MCO could provide this functionality. Either of | ||
these two operators are the obvious choice for such a check since they are | ||
responsible for both halves of the kubelet-API Server interaction. It makes | ||
more sense for the leading component to implement the check, however, since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap the discussion we had on this for those that were not able to attend:
- The API server ships with a set of built-in types (pod and node).
- The upstream project via API review/human or project operator oversight ensures that skew between API server and kubelet is n-2.
- The API server operator is being asked to report Upgradeable=False to capture that API review/human/operational oversight because it is the component that must not upgrade.
- The Machine Config Operator is the component that must upgrade, so reporting Upgradeable=False is confusing.
- The Machine Config Operator should report a message in its reporting that admins should perform an action.
I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).
For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Just recording for others that kube-apiserver is also a client of the kubelet given the exec flows. As a result, its operator also needs to respect the supported skew in that direction as well. Thanks for pointing this out @sttts in our discussion. |
…r scope Consider these cases: a. Component A is in a state that allows updates, and nothing in the rest of the cluster would break if A updated. b. Component A is in a state that allows updates, but component B (which is in-cluster, but not part of A) would break if A updated. c. Component A would break if it updated. Operator A should pretty clearly be Upgradeable=True for (a) and Upgradeable=False for (c). Before this commit, a narrow reading of the comment would have operator A be Upgradeable=True for (b). This commit moves it to Upgradeable=False, based on discussion in [1], where it becomes the job of the API-serve to set Upgradeable=False if updating the API-server would break nodes running old kubelets. The API-server can say "to unblock minor updates, update your kubelets". The machine-config operator will simultaneously say "hey, your kubelets are old, and here's how to update: $STEPS", but it won't use Upgradeable=False to say that (because the machine-config operator would be _happy_ to have its component nodes updated). As pointed out in discussion in [1], this is a bit of a bottomless pit. For example, component A may be removing a deprecated feature on update, and there may be user workloads that occasionally depend on that feature but hardly ever use it. Component A might reasonably think "nobody has used $OUTGOING_FEATURE in the last week, so I'm Upgradeable=True", and then post-update, the user-workload would go to hit the removed API and break. And obviously in-cluster components will have even more limited access to any out-of-cluster components that depend on them. So using Upgradeable=False to protect other components from breaking is going to be a best-effort sort of thing. But this commit pivots so that it's more clear that we'll put that effort in when we can. [1]: openshift/enhancements#762
…r scope Consider these cases: a. Component A is in a state that allows updates, and nothing in the rest of the cluster would break if A updated. b. Component A is in a state that allows updates, but component B (which is in-cluster, but not part of A) would break if A updated. c. Component A would break if it updated. Operator A should pretty clearly be Upgradeable=True for (a) and Upgradeable=False for (c). Before this commit, a narrow reading of the comment would have operator A be Upgradeable=True for (b). This commit moves it to Upgradeable=False, based on discussion in [1], where it becomes the job of the API-server to set Upgradeable=False if updating the API-server would break nodes running old kubelets. The API-server can say "to unblock minor updates, update your kubelets". The machine-config operator will simultaneously say "hey, your kubelets are old, and here's how to update: $STEPS", but it won't use Upgradeable=False to say that (because the machine-config operator would be _happy_ to have its component nodes updated). As pointed out in discussion in [1], this is a bit of a bottomless pit. For example, component A may be removing a deprecated feature on update, and there may be user workloads that occasionally depend on that feature but hardly ever use it. Component A might reasonably think "nobody has used $OUTGOING_FEATURE in the last week, so I'm Upgradeable=True", and then post-update, the user-workload would go to hit the removed API and break. And obviously in-cluster components will have even more limited access to any out-of-cluster components that depend on them. So using Upgradeable=False to protect other components from breaking is going to be a best-effort sort of thing. But this commit pivots so that it's more clear that we'll put that effort in when we can. [1]: openshift/enhancements#762
This changes the proposal to specify that the API Server Operator should
instead be responsible for enforcing the kubelet version skew.