Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eus-upgrades-mvp: don't enforce skew check in MCO #762

Merged
merged 3 commits into from
May 5, 2021
Merged

eus-upgrades-mvp: don't enforce skew check in MCO #762

merged 3 commits into from
May 5, 2021

Conversation

crawford
Copy link
Contributor

This changes the proposal to specify that the API Server Operator should
instead be responsible for enforcing the kubelet version skew.

@crawford
Copy link
Contributor Author

@markmc
Copy link
Contributor

markmc commented Apr 28, 2021

This reads like a good summary of what was discussed.

FWIW, I buy the argument that we're first and foremost asserting that it is not safe to upgrade the API server to a new kubernetes minor version, and that the admin needs to reduce skew to at least N-1 in order for it to be safe to update the API server. So the API server operator seems like the component to do that.

The challenge that the API server and MCO proposal have in common AIUI, is "is it safe to upgrade" is based only on the cluster state and not the target cluster version - i.e. surely we don't want either operator asserting Upgradeable=False simply because we're at the skew limit, because that would prevent legimiate z-stream updates?

@crawford
Copy link
Contributor Author

surely we don't want either operator asserting Upgradeable=False simply because we're at the skew limit, because that would prevent legimiate z-stream updates?

The critical detail (that I always forget) is that Upgradeable=False flag doesn't affect z-stream updates. It's slightly counterintuitive, but I guess this gets into the nuance of an update versus an upgrade - we don't consider a z-stream update an upgrade.

@markmc
Copy link
Contributor

markmc commented Apr 28, 2021

surely we don't want either operator asserting Upgradeable=False simply because we're at the skew limit, because that would prevent legimiate z-stream updates?

The critical detail (that I always forget) is that Upgradeable=False flag doesn't affect z-stream updates. It's slightly counterintuitive, but I guess this gets into the nuance of an update versus an upgrade - we don't consider a z-stream update an upgrade.

Wow! Thanks!

@dhellmann
Copy link
Contributor

surely we don't want either operator asserting Upgradeable=False simply because we're at the skew limit, because that would prevent legimiate z-stream updates?

The critical detail (that I always forget) is that Upgradeable=False flag doesn't affect z-stream updates. It's slightly counterintuitive, but I guess this gets into the nuance of an update versus an upgrade - we don't consider a z-stream update an upgrade.

That would be a good detail to include in the text, as a reminder. As you say, it's not the obvious interpretation of that field.

@sdodson
Copy link
Member

sdodson commented Apr 28, 2021

@markmc relating to current state versus target cluster state making the decision we have a long standing enhancement to allow target version definition of preflights which has run into a bit of a headwind. We do plan to make another attempt at pushing that in the future because we believe it to be more flexible and reduce friction imposed by backporting logic to older releases. PTAL if that interests you

#363

@sdodson
Copy link
Member

sdodson commented Apr 28, 2021

This looks fine to me as the author of the original proposal. I'll let those on teams affected by the change provide final approval.

@dhellmann
Copy link
Contributor

The markdown job is failing because this enhancement doesn't match the required template. These headings are missing from the upgrades section:

enhancements/update/eus-upgrades-mvp.md missing "#### Dev Preview -> Tech Preview"
enhancements/update/eus-upgrades-mvp.md missing "#### Tech Preview -> GA"
enhancements/update/eus-upgrades-mvp.md missing "#### Removing a deprecated feature"

the `Upgradeable` flag, the MCO could provide this functionality. Either of
these two operators are the obvious choice for such a check since they are
responsible for both halves of the kubelet-API Server interaction. It makes
more sense for the leading component to implement the check, however, since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a critical point for discussion. The two arguments in the other direction are locality to the resolving action and locality of skew policy knowledge.

locality to the resolving action is about where a user has to go to resolve an Upgradeable=False condition. The natural assumption we've seen from users and support in the past is that the operator with the unexpected condition is where to take a resolving action. Now in this particular case, the kube-apiserver will be the spot for a resolving action 0% of the time. The kube-apiserver will never be able to upgrade a kubelet to close the skew. The MCO will sometimes (often?) be the spot to close skew. Either by waiting for a rollout or by unpausing a machine-config pool. It's not perfect (byo-host for instance), but it is significantly more than the 0% of the kube-apiserver.

locality of skew policy knowledge is about which component knows its skew policy. It's worth remembering that even in a kube control plane, skew policy between kubelet, kube-controller-manager, kube-scheduler, SDN (and its various variants) are different. Expanding to everything in our payload (and what they subsequently install) it gets even larger. While it is possible for the kube-apiserver to be a rally point for every team enforcing their skew policy, the distributed nature of clusteroperator conditions means that knowledge can be codified close to the component with the skew policy knowledge. In this case, the MCO is the closest component (since we lack a kubelet operator).

In the general case, the locality to the resolving action and the locality to the skew policy knowledge aligns extremely well. In this specific case, they meet in the MCO.

I've taken the liberty of demonstrating in a PR openshift/machine-config-operator#2552

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy the first argument, but I'm struggling with "locality of skew policy knowledge". Between the kubelet and the API Server, it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API. This boils down to the same argument I've made in the PR: the leading component determines the acceptable skew. Based on that understanding, I would think the API Server Operator would be the place to at least store the skew policy.

Copy link
Contributor

@sttts sttts Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API

This is the point: the kube-apiserver dictates the API versions served only, the clients know which API versions they need and depend on. For that reason in upstream the API-Machinery team is NOT the one responsible for testing and fixing the skew of a release, but node team is. The API-Machinery team just follows the known, mechanical procedures of introducing new API versions, deprecating and removing old ones.

This is what David means with "locality of skew policy knowledge": the API consumers know and not the API provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or in yet other words:

  • locality to resolve skew problem: UX problem
  • locality to enforce skew policy: ppl problem, who owns the skew policy (kube-apiserver does not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two sides: server and client. The client would be using the correct version of the API, but the server may have a backwards incompatible change where it does not serve the correct payload to the client. In turn, the client may have a backwards incompatible change, and not know what to do with a particular payload. So there needs to be two checks: one in the MCO and one within the API. Each component has a responsibility to check the skew, since either side could break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with the server having to know its clients. This is not how it works in any of the API life cycle policies we apply in Kubernetes and OpenShift. The server has O(100) clients in an OpenShift cluster. It cannot follow individual skew policies, and I don't see why we should introduce another API philosophy here.

a backwards incompatible change

This is disallowed for the server and hence never happens (if we are doing our job well, and nearly always did so in the past).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not asking the server to know its clients though. We're asking its Operator to know them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To recap the discussion we had on this for those that were not able to attend:

  • The API server ships with a set of built-in types (pod and node).
  • The upstream project via API review/human or project operator oversight ensures that skew between API server and kubelet is n-2.
  • The API server operator is being asked to report Upgradeable=False to capture that API review/human/operational oversight because it is the component that must not upgrade.
  • The Machine Config Operator is the component that must upgrade, so reporting Upgradeable=False is confusing.
  • The Machine Config Operator should report a message in its reporting that admins should perform an action.

I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).

For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.

On top of that, the gating mechanism we have today is the `Upgradeable=False`
flag, which indicates that a particular operator cannot be upgraded, thereby
halting the upgrade of the entire cluster. It doesn't make sense for the MCO to
assert this condition, since an upgrade of the MCO would actually reduce the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be the case, since the skew happened. Suggesting that something else is allowing the skew to develop.

Copy link
Contributor Author

@crawford crawford Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me adjust that wording. What I mean to say is that the MCO plus its operand...

assert this condition, since an upgrade of the MCO would actually reduce the
skew. If the MCO were to use this mechanism to enforce the skew, it would be a
reinterpretation of the function of that flag to instead indicate that the
entire cluster cannot be upgraded. It's a subtle but important distinction that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in how placing this condition on the kube-apiserver-operator is a better experience for users when compared with the locality benefits to users outlined https://github.com/openshift/enhancements/pull/762/files#r623192675 .

This changes the proposal to specify that the API Server Operator should
instead be responsible for enforcing the kubelet version skew.
assert this condition, since an upgrade of the MCO and its operands (RHCOS)
would actually reduce the skew. If the MCO were to use this mechanism to
enforce the skew, it would be a reinterpretation of the function of that flag
to instead indicate that the entire cluster cannot be upgraded. It's a subtle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking this vs. the API, and yeah, current godocs are:

Upgradeable indicates whether the operator is in a state that is safe to upgrade...

Copy link
Member

@wking wking May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although what's being proposed here means that should actually be:

Upgradeable indicates whether the operator is safe to upgrade based on the current cluster state...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've floated openshift/api#926 with this update.

@kikisdeliveryservice
Copy link
Contributor

@rphillips As per our meeting on Thursday where representatives from all affected parties decided to go with this approach as a short-term compromise, does this PR need any more updates? Would be great to get this merged ASAP.

@rphillips
Copy link
Contributor

@derekwaynecarr or @eparis are approvers. I'll ask them to make a pass.

@derekwaynecarr
Copy link
Member

/assign

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a follow-on to this enhancement that states what the MCO must do.

It must recommend users upgrade a machine config pool per our prior discussion.

Since @crawford is not available to make that update, I will add that in a follow-up.

/approve
/lgtm

the `Upgradeable` flag, the MCO could provide this functionality. Either of
these two operators are the obvious choice for such a check since they are
responsible for both halves of the kubelet-API Server interaction. It makes
more sense for the leading component to implement the check, however, since
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To recap the discussion we had on this for those that were not able to attend:

  • The API server ships with a set of built-in types (pod and node).
  • The upstream project via API review/human or project operator oversight ensures that skew between API server and kubelet is n-2.
  • The API server operator is being asked to report Upgradeable=False to capture that API review/human/operational oversight because it is the component that must not upgrade.
  • The Machine Config Operator is the component that must upgrade, so reporting Upgradeable=False is confusing.
  • The Machine Config Operator should report a message in its reporting that admins should perform an action.

I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).

For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit 782d00a into openshift:master May 5, 2021
@crawford crawford deleted the eus-skew-gate branch May 5, 2021 15:01
@derekwaynecarr
Copy link
Member

Just recording for others that kube-apiserver is also a client of the kubelet given the exec flows. As a result, its operator also needs to respect the supported skew in that direction as well. Thanks for pointing this out @sttts in our discussion.

wking added a commit to wking/openshift-api that referenced this pull request May 21, 2021
…r scope

Consider these cases:

a. Component A is in a state that allows updates, and nothing in the
   rest of the cluster would break if A updated.
b. Component A is in a state that allows updates, but component B
   (which is in-cluster, but not part of A) would break if A updated.
c. Component A would break if it updated.

Operator A should pretty clearly be Upgradeable=True for (a) and
Upgradeable=False for (c).

Before this commit, a narrow reading of the comment would have
operator A be Upgradeable=True for (b).  This commit moves it to
Upgradeable=False, based on discussion in [1], where it becomes the
job of the API-serve to set Upgradeable=False if updating the
API-server would break nodes running old kubelets.  The API-server can
say "to unblock minor updates, update your kubelets".  The
machine-config operator will simultaneously say "hey, your kubelets
are old, and here's how to update: $STEPS", but it won't use
Upgradeable=False to say that (because the machine-config operator
would be _happy_ to have its component nodes updated).

As pointed out in discussion in [1], this is a bit of a bottomless
pit.  For example, component A may be removing a deprecated feature on
update, and there may be user workloads that occasionally depend on
that feature but hardly ever use it.  Component A might reasonably
think "nobody has used $OUTGOING_FEATURE in the last week, so I'm
Upgradeable=True", and then post-update, the user-workload would go to
hit the removed API and break.  And obviously in-cluster components
will have even more limited access to any out-of-cluster components
that depend on them.  So using Upgradeable=False to protect other
components from breaking is going to be a best-effort sort of thing.
But this commit pivots so that it's more clear that we'll put that
effort in when we can.

[1]: openshift/enhancements#762
wking added a commit to wking/openshift-api that referenced this pull request May 21, 2021
…r scope

Consider these cases:

a. Component A is in a state that allows updates, and nothing in the
   rest of the cluster would break if A updated.
b. Component A is in a state that allows updates, but component B
   (which is in-cluster, but not part of A) would break if A updated.
c. Component A would break if it updated.

Operator A should pretty clearly be Upgradeable=True for (a) and
Upgradeable=False for (c).

Before this commit, a narrow reading of the comment would have
operator A be Upgradeable=True for (b).  This commit moves it to
Upgradeable=False, based on discussion in [1], where it becomes the
job of the API-server to set Upgradeable=False if updating the
API-server would break nodes running old kubelets.  The API-server can
say "to unblock minor updates, update your kubelets".  The
machine-config operator will simultaneously say "hey, your kubelets
are old, and here's how to update: $STEPS", but it won't use
Upgradeable=False to say that (because the machine-config operator
would be _happy_ to have its component nodes updated).

As pointed out in discussion in [1], this is a bit of a bottomless
pit.  For example, component A may be removing a deprecated feature on
update, and there may be user workloads that occasionally depend on
that feature but hardly ever use it.  Component A might reasonably
think "nobody has used $OUTGOING_FEATURE in the last week, so I'm
Upgradeable=True", and then post-update, the user-workload would go to
hit the removed API and break.  And obviously in-cluster components
will have even more limited access to any out-of-cluster components
that depend on them.  So using Upgradeable=False to protect other
components from breaking is going to be a best-effort sort of thing.
But this commit pivots so that it's more clear that we'll put that
effort in when we can.

[1]: openshift/enhancements#762
@wking wking mentioned this pull request Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.