-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancements/authentication: detect invalid certificates #980
enhancements/authentication: detect invalid certificates #980
Conversation
9017edc
to
2d4bf5a
Compare
- kube-apiserver and aggregated API endpoints | ||
- oauth-server and its external route | ||
- oauth-server and external identity providers | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom serving certs for kube-apiserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts do you have a place where I can hook into the verification of those? Currently, I think we'd need a separate controller in cluster-kube-apiserver-operator
that checks config.openshift.io.APIServer#servingInfo.namedCertificates
, parses them, and verifies for invalid certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @sanchezl worked on that. He can link the code here.
|
||
In case of having invalid custom certificates configured for the external OAuth route | ||
we propose to extend the existing `RouterCertsDomainValidationController`. | ||
Formally we propose to add additional logic in `validateRouterCertificates` https://github.com/openshift/cluster-authentication-operator/blob/ff156ab2bdfbdd68b49d76547fea5ec28d9c3639/pkg/controllers/routercerts/controller.go#L129. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is going to end up by what, a degraded condition on an "invalid" certificate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also extend the https://github.com/openshift/cluster-authentication-operator/blob/d2f6218a6ab2daccbec43f71a495de1c80529ef6/pkg/controllers/customroute/custom_route_controller.go#L55 to not accept "invalid" certificates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
augmenting https://github.com/openshift/cluster-authentication-operator/blob/d2f6218a6ab2daccbec43f71a495de1c80529ef6/pkg/controllers/customroute/custom_route_controller.go#L55 is a good idea.
However we have to be careful: this concrete logic must not be backported. If users configured legacy certificates in the previous releases we have to block upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a degraded condition on an "invalid" certificate?
most likely yes, a degraded condition of i.e. authentication-operator will appear, as the connection will fail to verify.
|
||
### Non-Goals | ||
|
||
Other core workloads are out of scope for this enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we at least know what the other workloads are that are likely impacted by this?
does every component that exposes a TLS metrics endpoint need to do work, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware that all components expose a TLS metric like this. Our finding refers concretely to a contribution that was made upstream to Kubernetes.
An initial search on the openshift org reveals quite a few usages: https://github.com/search?q=org%3Aopenshift+x509ignoreCN%3D0&type=code
maybe @eparis has more insight as I believe he had touch points with the GODEBUG=x509ignoreCN=0
setting.
|
||
This enhancment therefore proposes: | ||
- a way to detect invalid certificates in kube-apiserver and oauth-server | ||
- a way to prevent upgrades from OpenShift 4.9 to OpenShift 4.10 in the face of invalid certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we planning to backport an Upgradeable=False
guard for this, or rely on the alert, or start with the alert and maybe follow up with Upgradeable=False
if we see a lot of alerts, or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking the suggestion is to backport an Upgradeable=False
guard for this. An alert would be feasible if, after the upgrade, the cluster would be functional. However, in the presence of invalid certificates, a cluster will simply be dysfunctional after the upgrade to 4.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a way to prevent upgrades from OpenShift 4.9 to OpenShift 4.10 in the face of invalid certificates
For cases where the platform is a client, this is probably valid. For cases where the platform is a server, but not a client, I think it's too strong to block upgrades. Can you separate the cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think all the cases enumerated here are platform clients:
- custom serving certificates for kube-apiserver: openshift workloads invoking api calls
- custom API webhooks: kube-apiserver being the client
- custom aggregated API endpoints: kube-apiserver being the client
- custom certificates for route endpoints: openshift oauth-proxy workloads being the client
- certificates of external auth identity providers: oauth-server being the client
2d4bf5a
to
9225dbd
Compare
Also submitted https://bugzilla.redhat.com/show_bug.cgi?id=2031839 to track urgency with respect to 4.10. |
x509: certificate relies on legacy Common Name field, use SANs instead | ||
``` | ||
|
||
Verification of server certificates is executed during TLS client handshakes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably mean during "TLS Handshake procedure" or when a client verified "Server Hello".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, rest lgtm
In Kubernetes detection of invalid certificates has been added as part of https://github.com/kubernetes/kubernetes/pull/95396. | ||
The introduced Prometheus `apiserver_webhooks_x509_missing_san_total` and `apiserver_kube_aggregator_x509_missing_san_total` metrics provides the means to detect invalid certificates against API webhooks and aggregated API endpoints. | ||
|
||
To detect invalid certificates this enhancement proposes to add a new controller in `cluster-kube-apiserver` that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster-kube-apiserver-operator
|
||
For oauth-server this enhancement proposes the introduction of the following new metric: | ||
|
||
- `openshift_auth_external_x509_missing_san_total`: a metric capturing the count of invalid certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What clients of this external server do we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What clients of this external server do we have?
If it is just the operator, could the operator build a custom verifier for the check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is just the operator, could the operator build a custom verifier for the check?
The client, in this case, is oauth-server, not authentication-operator.
#### kube-apiserver | ||
|
||
In Kubernetes detection of invalid certificates has been added as part of https://github.com/kubernetes/kubernetes/pull/95396. | ||
The introduced Prometheus `apiserver_webhooks_x509_missing_san_total` and `apiserver_kube_aggregator_x509_missing_san_total` metrics provides the means to detect invalid certificates against API webhooks and aggregated API endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alerting on this as well seems appropriate
|
||
For oauth-server this enhancement proposes the introduction of the following new metric: | ||
|
||
- `openshift_auth_external_x509_missing_san_total`: a metric capturing the count of invalid certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding the metric for an alert seems valid regardless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, good idea.
- custom API webhooks | ||
- custom aggregated API endpoints | ||
- custom certificates for route endpoints | ||
- certificates of external auth identity providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add certificates securing infrastructure API as a target case here?
For example, an OpenStack cloud exposing its API endpoints with a custom certificate. We have several platform-specific touchpoints (eg cluster-api-provider-openstack) that would need to expose the metric, if I understand this enhancement correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd consider this as a separate enhancement proposal or separate bugzilla. This proposal was not meant to be catch-all to focus on scope for the auth team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Opened https://bugzilla.redhat.com/show_bug.cgi?id=2038166 , thanks
### Upgrade / Downgrade Strategy | ||
|
||
The above proposed changes **must** be backported to OpenShift 4.9. | ||
Otherwise, a detection of invalid certificates will not be possible before an upgrade to OpenShift 4.10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be interested in what happens in future releases.
Scenario: a new 4.11 feature triggers calls to a new HTTPS endpoint. The feature, that doesn't exist in 4.10, is active in 4.11 independently of user intervention. The new HTTPS endpoint is served with what you defined as "invalid certificate".
In this scenario, upgrading from 4.10 to 4.11 potentially brings the cluster to an unstable state.
In general terms: for each new HTTPS endpoint we call in 4.y, do we commit on adding an HTTPS-cert check in 4.y-1.z? Do we commit to doing that until a specific date?
For context, Chrome has dropped support for CN in v58 in April 2017.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting from 4.10 we're not having a go runtime any more that is capable to accept legacy cert based endpoints in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me that library-go's IsHostnameError was specifically designed for checking invalid certificates under Go v1.17+. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, IsHostnameError
will only work with <=1.16.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@sdodson: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@s-urbaniak: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten I am supporting @s-urbaniak in finalising this proposal. I have addressed what I believe are the outstanding comments:
I will squash before merge as soon as we converge to consensus. |
@slaskawi @soltysh @deads2k @s-urbaniak |
d70ad9c
to
3f4466a
Compare
Co-authored-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> Co-authored-by: Pierre Prinetti <pierreprinetti@redhat.com>
3f4466a
to
836176f
Compare
Minor fix: correcting the "created" and "updated" dates at the top of the document. |
I believe this PR is ready for you to consider LGTM'ing, @mfojtik. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mfojtik, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@s-urbaniak: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cc @sttts @stlaz @deads2k