Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup system-internal-tls secrets and labels #14392

Closed
5 tasks done
ReToCode opened this issue Sep 18, 2023 · 12 comments
Closed
5 tasks done

Cleanup system-internal-tls secrets and labels #14392

ReToCode opened this issue Sep 18, 2023 · 12 comments
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Issues which should be fixed (post-triage)
Milestone

Comments

@ReToCode
Copy link
Member

ReToCode commented Sep 18, 2023

Context

The changes with regards to internal-encryption and mTLS contained code that was targeting future features. This results in currently having additional Secrets with certificates that are unused. Also we have a concept of routing-id on certain certificates, that (at least currently) has no meaning and makes finishing the system-internal-tls feature more complex and unintuitive.

An example of this is, I wanted to update net-kourier to use the new secrets and trust multiple SANs on upstream backends, but do do that, net-kourier needs to trust the SAN of the activator which (currently) implies manually setting routing-id=0 in net-kourier in code - which is coupling to internas of how Activator currently get's it certificates. If we want to have multiple routing-ids in the future, we need some sort of API to get the existing routing-ids and dynamically populate them in the net-* implementations.

So for now, we need to do some cleanup to complete the system-internal-tls feature.

Things to cleanup

  • Revision reconciler creates secrets in the Knative Services namespace with routing-id. We can drop this as it is unnecessary.
  • Drop the routing-id in the dataplane certificates
  • Rename the dataplane certificates SAN to be static instead of dynamic (e.g. dropping the routing-id) --> DataPlaneRoutingSAN
  • Drop the ControlPlane certificates, as they are currently not used anywhere (they might be used to encrypt the metadata-traffic in the future, but if so, we can then introduce the new certificates + secrets them)
  • All net-* implementations still use LegacyFakeDnsName and need to be updated to trusting:
    • If they support multi-san: DataPlaneRoutingSAN + DataPlaneUserSAN(ns)
    • If they do not support multi-san activator is always in path: DataPlaneRoutingSAN
@ReToCode ReToCode added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Issues which should be fixed (post-triage) labels Sep 18, 2023
@ReToCode
Copy link
Member Author

ReToCode commented Sep 18, 2023

/assign

/cc @KauzClay , @nak3

I think this makes it easier to finish system-internal-tls feature. We can always add stuff later when necessary. WDYT?

@KauzClay
Copy link
Contributor

KauzClay commented Sep 18, 2023

yeah I'm on board for cleaning up. I think there were good intentions in adding all the future-looking stuff, but currently it is hard to understand the state of what actually works.

Adding them back piece by piece when we get to it sounds good.

So would your proposed cleanups essentially bring us back to where the internal-encryption flag was initially (Same SAN used everywhere)?

And then as net-* providers support multi-SAN, they start using it?

If they support multi-san: DataPlaneRoutingSAN + DataPlaneUserSAN(ns)

Does there need to be any change in serving to allow the use of this type of SAN? Or does it already exist? I'm a little behind on the current state of things.

@ReToCode
Copy link
Member Author

yeah I'm on board for cleaning up. I think there were good intentions in adding all the future-looking stuff, but currently it is hard to understand the state of what actually works.

Exactly this 💯

So would your proposed cleanups essentially bring us back to where the internal-encryption flag was initially (Same SAN used everywhere)?

Yes. The current certs contain each two SANs:

knative-serving: routing-serving-certs
                DNS:kn-routing, DNS:data-plane.knative.dev
knative-serving: knative-serving-certs (legacy)
                DNS:data-plane.knative.dev
knative-service-namespace: serving-certs
                DNS:kn-user-default, DNS:data-plane.knative.dev

I'm working on updating Activator + QP to serve the new certs (with kn-routing and kn-user-<ns> SANs) and all net-* implementations can continue to trust the data-plane.knative.dev SAN. Then each net-* implementation can switch to trusting the specific SANs when trusting multiple SANs is possible.

WDYT?

@nak3
Copy link
Contributor

nak3 commented Sep 20, 2023

The kourier change looks good. But I'm still not clear how activator trusts the SAN of the queue-proxy. Specifically, what value will be assigned in this cr.TLSConf.ServerName here?

cr.TLSConf.ServerName = certificates.LegacyFakeDnsName

@ReToCode
Copy link
Member Author

ReToCode commented Sep 20, 2023

But I'm still not clear how activator trusts the SAN of the queue-proxy. Specifically, what value will be assigned in this cr.TLSConf.ServerName here?

That is a good point. I stumbled upon that in #14399. IMHO, for now we can stay on LegacyFakeDnsName as the namespace based SAN is only necessary if we want to do mTLS (which I'm still not convinced we need natively). If we would want to do the mTLS stuff, we probably would need to watch all those secrets and create a dedicated TLSConf for each Knative Service with the expected SAN there. As golang itself also does not support one TLSConf with multiple SANs.

@nak3
Copy link
Contributor

nak3 commented Sep 20, 2023

Ah, I realized that activator can trust QP's kn-user-<ns> SAN by using VerifyConnection. David did that on his PR.

https://github.com/knative/serving/pull/13969/files#diff-0c6071b66f1c787cdcce930a0c4bdc8f4e48d276bc9226c8aa5223394c3c21f0R73-R75

So, I think we should make certs have one SAN like this?

knative-serving: routing-serving-certs
                DNS:kn-routing
knative-serving: knative-serving-certs (legacy)
                DNS:data-plane.knative.dev
knative-service-namespace: serving-certs
                DNS:kn-user-<ns>

If we will use kn-user-<ns> for Ingress -> QP, then we should not use the legacy SAN for activator -> QP. Otherwise, I feel no sense to use kn-user-<ns>.

@ReToCode
Copy link
Member Author

Yeah thats a good point and an even better solution for activator 👍 I think its good to have the second SAN in there for the transition period, for example during an update, theoretically the QP could already host the new cert and activator is still on rolling-update only trusting the old SAN. So with having both this is reverse compatible.

But we definitely can drop the second SAN afterwards (assuming we keep activator in path for contour + gw-api or use SNI on activator).

@nak3
Copy link
Contributor

nak3 commented Sep 20, 2023

I think its good to have the second SAN in there for the transition period

Yes, absolutely agree.
And now the proposal sounds all good to me. Thank you!

@ReToCode
Copy link
Member Author

Created an issue for this: #14402. Feel free to grab it if you like.

@ReToCode ReToCode changed the title Cleanup knative-internal-tls secrets and labels Cleanup system-internal-tls secrets and labels Sep 22, 2023
@ReToCode ReToCode moved this to In Progress in Serving Milestones Sep 22, 2023
@ReToCode
Copy link
Member Author

ReToCode commented Oct 9, 2023

Cleanup and usage of new secrets will be tracked here.

/close

@knative-prow knative-prow bot closed this as completed Oct 9, 2023
@knative-prow
Copy link

knative-prow bot commented Oct 9, 2023

@ReToCode: Closing this issue.

In response to this:

Cleanup and usage of new secrets will be tracked here.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Serving Milestones Oct 9, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Serving Encryption Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Issues which should be fixed (post-triage)
Projects
Development

No branches or pull requests

3 participants