Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add certificate watcher to queue-proxy #14189

Merged
merged 3 commits into from
Aug 31, 2023

Conversation

ReToCode
Copy link
Member

@ReToCode ReToCode commented Jul 26, 2023

Fixes #14187

Proposed Changes

  • Use projected volumes to mount certificate secret in queue-proxy to get updates when secret changes
  • Adds a certificate watcher to queue-proxy to reload certificates when they change on the file system

Release Note

When `internal-encryption` is enabled, queue-proxy mounts the certificate secret as projected-volume and automatically reloads the certificates on change.

@knative-prow knative-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 26, 2023
@knative-prow knative-prow bot requested review from jsanin-vmw and kauana July 26, 2023 06:35
@knative-prow knative-prow bot added area/API API objects and controllers area/autoscale area/networking labels Jul 26, 2023
@ReToCode
Copy link
Member Author

@dprotaso , @skonto , @nak3 , @KauzClay
Before I add unit-tests and e2e-tests, what do you think about the approach? Problem description see: #14187.

Also is there anything to keep in mind/check when adding new dependencies?

@nak3
Copy link
Contributor

nak3 commented Jul 26, 2023

Use projected volumes to mount certificate secret in queue-proxy to get updates when secret changes

inotify on k8s is a little bit tricky kubernetes/kubernetes#112677 - but it would work well? I guess using projected volumes could solve it? (Honsetly, I didn't know why it changed to use projected volume so I am guessing it.)

Also is there anything to keep in mind/check when adding new dependencies?

We want to make QP small as much as possible #9957. But the deps is enough small that we can ignore, isn't it? As far as I built the image, it seems OK.
Also, we would like to make sure memory/CPU consumption is not increased by the inotify watcher. (Sorry I haven't confirmed it on my side 😅 )

@ReToCode
Copy link
Member Author

inotify on k8s is a little bit tricky kubernetes/kubernetes#112677 - but it would work well? I guess using projected volumes could solve it? (Honsetly, I didn't know why it changed to use projected volume so I am guessing it.)

Good points, thank you. Yes I had the same issue with normal secrets, but projected volumes seem to work fine. But I'll probably do some more testing with it.

Seems like thanos went with an approach to continuously re-read the file (thanos-io/thanos#6503) and see if there are differences. That could also work fine, but could take some more time for the certificate to be updated, so we somehow would still would need to trust multiple root CAs for some time in activator and all ingress-controllers to make it work properly - or implement proper cross signing ourselves.

I still feel like that we are rebuilding too much stuff ourselves instead of leveraging functionality of a Service Mesh or directly relying on solutions like cert-manager.

In the end we probably will end up with some limitations and known issues with our custom solution.

@KauzClay
Copy link
Contributor

I think your approach makes sense, seems like the path of least resistance too.

Asking just so I can understand the scenario better: do you have context on why QP uses the mounted secrets? Is there a reason it doesn't have a trust-store like activator? Maybe that makes it too big, per @nak3 's point?

I still feel like that we are rebuilding too much stuff ourselves instead of leveraging functionality of a Service Mesh or directly relying on solutions like cert-manager.

generally I'd agree, would it be so bad if Knative had a dependency on cert-manager?

But I might not be seeing it in this case, is there a way you see cert-manager would help here? We would still need the logic to check for the change in the mounted secret, right?

@ReToCode
Copy link
Member Author

Is there a reason it doesn't have a trust-store like activator? Maybe that makes it too big, per @nak3 's point?

Queue-Proxy does not have access to the kube api (so no informers or watchers on secrets). Not sure what the original reasons for this are, but I also assume it is related to the size topic.

generally I'd agree, would it be so bad if Knative had a dependency on cert-manager?

I don't think so either, especially as this would only be necessary when people want to use encryption. @evankanderson had some doubts about having that.

But I might not be seeing it in this case, is there a way you see cert-manager would help here? We would still need the logic to check for the change in the mounted secret, right?

It would not help here directly. It is more about the rotation of the CA. Our current implementation just creates a new CA and immediately rotates all secrets. We probably will have to build some logic that does proper rotation. These things would already be handled with cert-manager. With this, ingresses, activator and proxies would have some time to shift to the new certs.

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 31, 2023
@ReToCode
Copy link
Member Author

The thanos approach is fairly easy and the introduced delay should not matter as we need to make sure we allow some rotation grace period in activator & ingress controllers anyways (reloading certs across all K Services will take some time), so I'm going to continue with that approach. Will add tests and e2e tests shortly.

@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Patch coverage: 54.65% and project coverage change: -0.14% ⚠️

Comparison is base (8ea1cb6) 86.25% compared to head (218c1bf) 86.11%.
Report is 68 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14189      +/-   ##
==========================================
- Coverage   86.25%   86.11%   -0.14%     
==========================================
  Files         199      196       -3     
  Lines       14811    14781      -30     
==========================================
- Hits        12775    12729      -46     
- Misses       1734     1746      +12     
- Partials      302      306       +4     
Files Changed Coverage Δ
pkg/queue/sharedmain/main.go 0.83% <0.00%> (-0.04%) ⬇️
pkg/queue/certificate/watcher.go 69.11% <69.11%> (ø)

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ReToCode ReToCode changed the title [WIP] Add certificate watcher to queue-proxy Add certificate watcher to queue-proxy Aug 2, 2023
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2023
@ReToCode
Copy link
Member Author

ReToCode commented Aug 2, 2023

/unhold

This is ready to be reviewed. I'll add another issue for full CA rotation, as more work is needed to make that work (as activator and all net-* components also must support it).

@dprotaso , @skonto , @nak3 , @KauzClay, @davidhadas

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2023
@nak3
Copy link
Contributor

nak3 commented Aug 3, 2023

Use projected volumes to mount certificate secret in queue-proxy to get updates when secret changes

I thought that the projected volumes was used for the workaround of inotify. Do we still need the change to use projected volumes?

@ReToCode
Copy link
Member Author

ReToCode commented Aug 3, 2023

I thought that the projected volumes was used for the workaround of inotify. Do we still need the change to use projected volumes?

The update behaviour with projectedVolumes seems to be way more stable in my tests. Do you think there is a downside to using it this way?

@ReToCode
Copy link
Member Author

ReToCode commented Aug 4, 2023

/retest

@ReToCode
Copy link
Member Author

ReToCode commented Aug 4, 2023

The update behaviour with projectedVolumes seems to be way more stable in my tests. Do you think there is a downside to using it this way?

I checked this again, it takes a bit longer to refresh on disk, but in general we should be fine using a normal mount. I also checked Kube RBAC proxy, it does it the same way. Updated the PR accordingly.

pkg/queue/sharedmain/main.go Show resolved Hide resolved
// NewCertWatcher creates a CertWatcher and watches
// the certificate and key files. It reloads the contents on file change.
// Make sure to stop the CertWatcher using Stop() upon destroy.
func NewCertWatcher(certPath, keyPath string, reloadInterval time.Duration, logger *zap.SugaredLogger) (*CertWatcher, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certPath and keyPath are fixed value.
We can move the const values (certPath and keyPath) to pkg/queue/certificate or some common package and drop the arguments and field in the struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm for now CertWatcher pretty generic (and could be used somewhere else). The constants seem to be with other configuration in sharedmain which also need to correspond to the yaml, so I think I'd prefer to let them there. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem. Let's keep there.

pkg/queue/certificate/watcher.go Outdated Show resolved Hide resolved
@nak3
Copy link
Contributor

nak3 commented Aug 7, 2023

/lgtm
/hold

Thank you so much 🎉 /hold for other reviewers.

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2023
@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2023
@KauzClay
Copy link
Contributor

KauzClay commented Aug 8, 2023

looks good to me as well!

/unhold

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2023
@KauzClay
Copy link
Contributor

KauzClay commented Aug 9, 2023

ah, I thought unholding this would let it be merged?

Seems like it needs to be approved? Idk if I have approval power

@krsna-m
Copy link
Contributor

krsna-m commented Aug 14, 2023

/approve

@knative-prow
Copy link

knative-prow bot commented Aug 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kvmware, ReToCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2023
@krsna-m
Copy link
Contributor

krsna-m commented Aug 14, 2023

/unapprove
/hold

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 14, 2023
@krsna-m krsna-m removed approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 14, 2023
@krsna-m
Copy link
Contributor

krsna-m commented Aug 14, 2023

/lgtm also

@ReToCode
Copy link
Member Author

@dprotaso can this be merged? Some Approver magic is needed.

@ReToCode
Copy link
Member Author

/assign @dprotaso

Copy link
Member

@dprotaso dprotaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - if you haven't noticed K8s is looking to provide a generic mechanism to distribute CAbundles

Though it's in early design phases
https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-trust-anchor-sets#pemtrustanchors-projected-volume-source

pkg/queue/certificate/watcher.go Show resolved Hide resolved
pkg/queue/sharedmain/main.go Outdated Show resolved Hide resolved
@knative-prow knative-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm Indicates that a PR is ready to be merged. labels Aug 30, 2023
@ReToCode
Copy link
Member Author

FYI - if you haven't noticed K8s is looking to provide a generic mechanism to distribute CAbundles

That is nice, definitely a thing that will help us.

@dprotaso
Copy link
Member

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Aug 31, 2023
@knative-prow knative-prow bot merged commit 8dbb2d3 into knative:main Aug 31, 2023
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/networking lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement automatic certificate rotation on Queue-Proxy
6 participants