Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add certificate expiry check, events, and metrics #9772

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

brandond
Copy link
Member

@brandond brandond commented Mar 23, 2024

Proposed Changes

  • Moves client and server cert listing code into utils/service (open to suggestions on a better place/name for this)
  • Fix k3s certificate rotate failing on agents due to missing server token file
  • Adds k3s certificate check command, to check and print the status of server and agent certificates
  • Adds periodic automatic checking of certificates; creates warning events attached to the node for certs that are expired or about to expire, and updates metrics

Types of Changes

enhancement

Verification

  • Start K3s with CATTLE_NEW_SIGNED_CERT_EXPIRATION_DAYS=30 in service environment to trigger short-lived certs
  • Run new k3s certificate check command
  • Check for events

Testing

Linked Issues

User-Facing Change


Further Comments

@brandond brandond changed the title Add certificate expiry check and warnings [WIP] Add certificate expiry check and warnings Mar 23, 2024
@brandond
Copy link
Member Author

brandond commented Mar 25, 2024

Example:

root@k3s-server-1:/# k3s certificate check
INFO[0000] Server detected, checking agent and server certificates
INFO[0000] Checking certificates for k3s-controller
INFO[0000] /var/lib/rancher/k3s/server/tls/client-k3s-controller.crt: certificate CN=system:k3s-controller is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-k3s-controller.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-k3s-controller.crt: certificate CN=system:k3s-controller is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-k3s-controller.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for admin
INFO[0000] /var/lib/rancher/k3s/server/tls/client-admin.crt: certificate CN=system:admin,O=system:masters is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-admin.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for auth-proxy
INFO[0000] /var/lib/rancher/k3s/server/tls/client-auth-proxy.crt: certificate CN=system:auth-proxy is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-auth-proxy.crt: certificate CN=k3s-request-header-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for controller-manager
INFO[0000] /var/lib/rancher/k3s/server/tls/client-controller.crt: certificate CN=system:kube-controller-manager is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-controller.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for kubelet
INFO[0000] /var/lib/rancher/k3s/agent/client-kubelet.crt: certificate CN=system:node:k3s-server-1,O=system:nodes is ok, expires at 2025-03-25T22:34:30Z
INFO[0000] /var/lib/rancher/k3s/agent/client-kubelet.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/serving-kubelet.crt: certificate CN=k3s-server-1 is ok, expires at 2025-03-25T22:34:30Z
INFO[0000] /var/lib/rancher/k3s/agent/serving-kubelet.crt: certificate CN=k3s-server-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for kube-proxy
INFO[0000] /var/lib/rancher/k3s/server/tls/client-kube-proxy.crt: certificate CN=system:kube-proxy is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-kube-proxy.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-kube-proxy.crt: certificate CN=system:kube-proxy is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-kube-proxy.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for api-server
INFO[0000] /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt: certificate CN=system:apiserver,O=system:masters is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt: certificate CN=kube-apiserver is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt: certificate CN=k3s-server-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for cloud-controller
INFO[0000] /var/lib/rancher/k3s/server/tls/client-k3s-cloud-controller.crt: certificate CN=k3s-cloud-controller-manager is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-k3s-cloud-controller.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for etcd
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/client.crt: certificate CN=etcd-client is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/client.crt: certificate CN=etcd-server-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/server-client.crt: certificate CN=etcd-server is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/server-client.crt: certificate CN=etcd-server-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt: certificate CN=etcd-peer is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt: certificate CN=etcd-peer-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for scheduler
INFO[0000] /var/lib/rancher/k3s/server/tls/client-scheduler.crt: certificate CN=system:kube-scheduler is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/server/tls/client-scheduler.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
root@k3s-agent-1:/# k3s certificate check
INFO[0000] Agent detected, checking agent certificates
INFO[0000] Checking certificates for kube-proxy
INFO[0000] /var/lib/rancher/k3s/agent/client-kube-proxy.crt: certificate CN=system:kube-proxy is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-kube-proxy.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for kubelet
INFO[0000] /var/lib/rancher/k3s/agent/client-kubelet.crt: certificate CN=system:node:k3s-agent-1,O=system:nodes is ok, expires at 2025-03-25T22:34:36Z
INFO[0000] /var/lib/rancher/k3s/agent/client-kubelet.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/serving-kubelet.crt: certificate CN=k3s-agent-1 is ok, expires at 2025-03-25T22:34:36Z
INFO[0000] /var/lib/rancher/k3s/agent/serving-kubelet.crt: certificate CN=k3s-server-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
INFO[0000] Checking certificates for k3s-controller
INFO[0000] /var/lib/rancher/k3s/agent/client-k3s-controller.crt: certificate CN=system:k3s-controller is ok, expires at 2025-03-25T22:34:28Z
INFO[0000] /var/lib/rancher/k3s/agent/client-k3s-controller.crt: certificate CN=k3s-client-ca@1711406068 is ok, expires at 2034-03-23T22:34:28Z
brandond@dev01:~$ kubectl get --raw /api/v1/nodes/k3s-server-1/proxy/metrics | grep k3s_certificate_expiration
# HELP k3s_certificate_expiration_seconds Remaining lifetime on the certificate.
# TYPE k3s_certificate_expiration_seconds gauge
k3s_certificate_expiration_seconds{subject="CN=etcd-client",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=etcd-peer",usages="ServerAuth,ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=etcd-peer-ca@1711483842",usages="CertSign"} 3.153599970558642e+08
k3s_certificate_expiration_seconds{subject="CN=etcd-server",usages="ServerAuth,ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=etcd-server-ca@1711483842",usages="CertSign"} 3.153599970558642e+08
k3s_certificate_expiration_seconds{subject="CN=k3s-client-ca@1711483842",usages="CertSign"} 3.153599970558642e+08
k3s_certificate_expiration_seconds{subject="CN=k3s-cloud-controller-manager",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=k3s-request-header-ca@1711483842",usages="CertSign"} 3.153599970558642e+08
k3s_certificate_expiration_seconds{subject="CN=k3s-server-1",usages="ServerAuth"} 3.15359980568481e+07
k3s_certificate_expiration_seconds{subject="CN=k3s-server-ca@1711483842",usages="CertSign"} 3.153599970558642e+08
k3s_certificate_expiration_seconds{subject="CN=kube-apiserver",usages="ServerAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:admin,O=system:masters",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:apiserver,O=system:masters",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:auth-proxy",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:k3s-controller",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:kube-controller-manager",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:kube-proxy",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:kube-scheduler",usages="ClientAuth"} 3.15359970568481e+07
k3s_certificate_expiration_seconds{subject="CN=system:node:k3s-server-1,O=system:nodes",usages="ClientAuth"} 3.15359980568481e+07

brandond@dev01:~$ kubectl get event
LAST SEEN   TYPE      REASON                           OBJECT                     MESSAGE
2m26s       Warning   CertificateExpirationWarning     node/k3s-server-1          Node certificates require attention - restart k3s on this node to trigger automatic rotation: admin/client-admin.crt: certificate CN=system:admin,O=system:masters will expire within 90 days at 2024-04-27T18:56:27Z, api-server/client-kube-apiserver.crt: certificate CN=system:apiserver,O=system:masters will expire within 90 days at 2024-04-27T18:56:27Z, api-server/serving-kube-apiserver.crt: certificate CN=kube-apiserver will expire within 90 days at 2024-04-27T18:56:27Z, auth-proxy/client-auth-proxy.crt: certificate CN=system:auth-proxy will expire within 90 days at 2024-04-27T18:56:27Z, cloud-controller/client-k3s-cloud-controller.crt: certificate CN=k3s-cloud-controller-manager will expire within 90 days at 2024-04-27T18:56:27Z, controller-manager/client-controller.crt: certificate CN=system:kube-controller-manager will expire within 90 days at 2024-04-27T18:56:27Z, etcd/client.crt: certificate CN=etcd-client will expire within 90 days at 2024-04-27T18:56:27Z, etcd/server-client.crt: certificate CN=etcd-server will expire within 90 days at 2024-04-27T18:56:27Z, etcd/peer-server-client.crt: certificate CN=etcd-peer will expire within 90 days at 2024-04-27T18:56:27Z, scheduler/client-scheduler.crt: certificate CN=system:kube-scheduler will expire within 90 days at 2024-04-27T18:56:27Z, kube-proxy/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-04-27T18:56:27Z, kube-proxy/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-04-27T18:56:27Z, kubelet/client-kubelet.crt: certificate CN=system:node:k3s-server-1,O=system:nodes will expire within 90 days at 2024-04-27T18:56:29Z, kubelet/serving-kubelet.crt: certificate CN=k3s-server-1 will expire within 90 days at 2024-04-27T18:56:28Z, k3s-controller/client-k3s-controller.crt: certificate CN=system:k3s-controller will expire within 90 days at 2024-04-27T18:56:27Z, k3s-controller/client-k3s-controller.crt: certificate CN=system:k3s-controller will expire within 90 days at 2024-04-27T18:56:27Z

@brandond brandond changed the title [WIP] Add certificate expiry check and warnings Add certificate expiry check and warnings Mar 26, 2024
@brandond brandond marked this pull request as ready for review March 26, 2024 00:46
@brandond brandond requested a review from a team as a code owner March 26, 2024 00:46
Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 55.21739% with 103 lines in your changes are missing coverage. Please review.

Project coverage is 40.87%. Comparing base (8aecc26) to head (a49efe4).
Report is 2 commits behind head on master.

Files Patch % Lines
pkg/cli/cert/cert.go 0.00% 72 Missing ⚠️
pkg/certmonitor/certmonitor.go 70.49% 10 Missing and 8 partials ⚠️
pkg/util/services/services.go 91.25% 7 Missing ⚠️
pkg/agent/run.go 25.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #9772       +/-   ##
===========================================
- Coverage   52.94%   40.87%   -12.08%     
===========================================
  Files         154      153        -1     
  Lines       13601    13667       +66     
===========================================
- Hits         7201     5586     -1615     
- Misses       5038     6957     +1919     
+ Partials     1362     1124      -238     
Flag Coverage Δ
e2etests 39.65% <55.21%> (-9.80%) ⬇️
inttests 35.47% <55.21%> (-3.90%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond force-pushed the cert-expire-warning branch 2 times, most recently from ee1769a to 0f026a2 Compare March 26, 2024 03:31
vitorsavian
vitorsavian previously approved these changes Mar 26, 2024
@brandond brandond changed the title Add certificate expiry check and warnings Add certificate expiry check, events, and metrics Mar 26, 2024
@brandond brandond requested review from vitorsavian and a team March 26, 2024 20:21
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants