-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OSDOCS#10055]: Document manual rotation of etcd signer certificates #77406
Conversation
🤖 Fri Jun 21 21:00:55 - Prow CI generated the docs preview: |
/retest |
$ oc get secret -n openshift-etcd etcd-signer -ojsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}' | ||
---- | ||
|
||
. Recreate the signer, if the remaining lifetime is close to the current date, by deleting the signer and wait for the status pod rollout: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static pod
$ oc wait --for=condition=Progressing=False --timeout=15m clusteroperator/etcd | ||
---- | ||
|
||
. After the CA rotates, you can switch the original CA with the new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CA doesn't rotate (yet), etcd restarts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) Thanks Thomas!
|
||
:_mod-docs-content-type: PROCEDURE | ||
[id="rotating-certificate-authority_{context}"] | ||
=== Rotating the etcd certificate authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good as a whole! we need to also mention the metrics-signer-ca
rotation, which is equally important for metrics and alerting
.Procedure | ||
|
||
. Verify the remaining lifetime of the new signer certificate: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmalove Try adding a +
here and see if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lahinson the problem was the file suffix...it is corrected now, thanks!
e678190
to
ca09778
Compare
/ok-to-test |
/retest |
4466e83
to
21a2784
Compare
/lgtm |
/lgtm @geliu2016 ensure that you don't delete the signer in the openshift-config namespace, that would indeed create failures |
21a2784
to
c21641e
Compare
/label peer-review-needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions but overall great addition!
/remove-label peer-review-in-progress
/remove-label peer-review-needed
/label peer-review-done
|
||
.Additional resources | ||
|
||
* xref:../../security/certificate_types_descriptions/etcd-certificates.adoc#rotating-certificate-authority_cert-types-etcd-certificates[Rotating the etcd certificate]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* xref:../../security/certificate_types_descriptions/etcd-certificates.adoc#rotating-certificate-authority_cert-types-etcd-certificates[Rotating the etcd certificate]. | |
* xref:../../security/certificate_types_descriptions/etcd-certificates.adoc#rotating-certificate-authority_cert-types-etcd-certificates[Rotating the etcd certificate] |
|
||
.Procedure | ||
|
||
. Verify the remaining lifetime of the new signer certificate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Verify the remaining lifetime of the new signer certificate: | |
. Verify the remaining lifetime of the new signer certificate by running the following command: |
. Re-create the signer, if the remaining lifetime is close to the current date, by deleting the signer and wait for the static pod rollout: | ||
+ | ||
[source,terminal] | ||
---- | ||
$ oc delete secret -n openshift-etcd etcd-signer | ||
---- | ||
+ | ||
[source,terminal] | ||
---- | ||
$ oc wait --for=condition=Progressing=False --timeout=15m clusteroperator/etcd | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Re-create the signer, if the remaining lifetime is close to the current date, by deleting the signer and wait for the static pod rollout: | |
+ | |
[source,terminal] | |
---- | |
$ oc delete secret -n openshift-etcd etcd-signer | |
---- | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=15m clusteroperator/etcd | |
---- | |
. If the remaining lifetime is close to the current date, re-create the signer by running the following commands: | |
.. Delete the signer: | |
+ | |
[source,terminal] | |
---- | |
$ oc delete secret -n openshift-etcd etcd-signer | |
---- | |
.. Wait for the static pod rollout | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=15m clusteroperator/etcd | |
---- |
$ oc wait --for=condition=Progressing=False --timeout=15m clusteroperator/etcd | ||
---- | ||
|
||
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. | |
. After `etcd` restarts, switch the original CA in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. |
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. | ||
+ | ||
[source,terminal] | ||
---- | ||
$ oc get secret etcd-signer -n openshift-etcd -ojson | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' | oc apply -n openshift-config -f - | ||
---- | ||
+ | ||
[source,terminal] | ||
---- | ||
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/etcd | ||
---- | ||
+ | ||
[source,terminal] | ||
---- | ||
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/kube-apiserver | ||
---- | ||
|
||
You can also use this single command to switch the CA: | ||
|
||
[source,terminal] | ||
---- | ||
$ oc adm wait-for-stable-cluster | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document a single way to do this, unless there is a specific use case for a second method. In this instance, if the two methods are equivalent, I would lean towards the single-command step for simplicity.
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. | |
+ | |
[source,terminal] | |
---- | |
$ oc get secret etcd-signer -n openshift-etcd -ojson | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' | oc apply -n openshift-config -f - | |
---- | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/etcd | |
---- | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/kube-apiserver | |
---- | |
You can also use this single command to switch the CA: | |
[source,terminal] | |
---- | |
$ oc adm wait-for-stable-cluster | |
---- | |
. After `etcd` restarts, switch the original CA in the `openshift-config` namespace with the new, rotated one in `openshift-etcd` by running the following command: | |
+ | |
[source,terminal] | |
---- | |
$ oc adm wait-for-stable-cluster | |
---- |
If you need to use the multi-command version, it should be split into substeps (similar to my suggestion for step 2 above) so we have a single command per step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oc adm wait-for-stable-cluster
cmd does the equivalent check as the previous two cmds (and more) but will wait until that required conditions are true for a period of 5 mins (--minimum-stable-period 5m
) so in practice it might take longer to wait on that step than the individual cmds.
$ oc adm wait-for-stable-cluster -h
Wait for all OCP v4 clusteroperators to report Available=true, Progressing=false, Degraded=false.
Examples:
# Wait for all clusteroperators to become stable
oc adm clusteroperator wait-for-stable-cluster
# Consider operators to be stable if they report as such for 5 minutes straight
oc adm clusteroperator wait-for-stable-cluster --minimum-stable-period 5m
Options:
--minimum-stable-period=5m0s:
minimum duration to consider a cluster stable. Defaults to 5 minutes.
--timeout=1h0m0s:
duration before the command times out. Defaults to 1 hour.
That may be a long time from the user's perspective so maybe a note here that the oc adm wait-for-stable-cluster
will wait for a minimum of 5 mins by default.
@tjungblu Do you think we need to wait that long in practice or if we can shorten it e.g oc adm wait-for-stable-cluster --minimum-stable-period 2m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2m is enough for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for confirming. Then we can combine those into the following. Note we still need the step to switch out the CA first and then wait with the singular cmd for 2m.
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. | |
+ | |
[source,terminal] | |
---- | |
$ oc get secret etcd-signer -n openshift-etcd -ojson | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' | oc apply -n openshift-config -f - | |
---- | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/etcd | |
---- | |
+ | |
[source,terminal] | |
---- | |
$ oc wait --for=condition=Progressing=False --timeout=25m clusteroperator/kube-apiserver | |
---- | |
You can also use this single command to switch the CA: | |
[source,terminal] | |
---- | |
$ oc adm wait-for-stable-cluster | |
---- | |
. After `etcd` restarts, you can switch the original certificate authority(CA) in the `openshift-config` namespace with the new, rotated one in `openshift-etcd`. | |
+ | |
[source,terminal] | |
---- | |
$ oc get secret etcd-signer -n openshift-etcd -ojson | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' | oc apply -n openshift-config -f - | |
---- | |
+ | |
You can then wait for the cluster operators to rollout and become stable: | |
[source,terminal] | |
---- | |
$ oc adm wait-for-stable-cluster --minimum-stable-period 2m | |
---- |
== etcd certificate rotation alerts and metrics signer certificates | ||
|
||
Two alert types inform users about pending `etcd` certificate expiration: | ||
[horizontal] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know for sure if our tooling handles this well (looks good on the preview, but I haven't seen it before so wonder about the Portal 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeana-redhat I did a quick look at it, but couldn't find an example online. I decided that because it was documented in our doc references, that it would not be an issue. :-) I will make a note to verify this list after it hits the portal! Thanks for that input!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please validate that this looks ok on docs.redhat after GA.
You can rotate the certificate: | ||
|
||
* When you receive an expiration alert | ||
* When the private key is leaked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can rotate the certificate: | |
* When you receive an expiration alert | |
* When the private key is leaked | |
You can rotate the certificate for the following reasons: | |
* You receive an expiration alert | |
* The private key is leaked |
* When you receive an expiration alert | ||
* When the private key is leaked | ||
|
||
NOTE: When a private key is leaked, you must rotate all of the certificates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly wonder if this is more important than a note 🤔
NOTE: When a private key is leaked, you must rotate all of the certificates. | |
[NOTE] | |
==== | |
When a private key is leaked, you must rotate all of the certificates. | |
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeana-redhat I can't find it now where I saw this construct for an admonition documented (but I will!). However, I will make this change and also change it to 'IMPORTANT', because I agree that it is more impacting than having it as a note.
c21641e
to
5a1f3f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two tiny nits otherwise LGTM
---- | ||
|
||
. Wait for the cluster Operators to rollout and stabilize by running the following command: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a +
here to attach the command to the step
+ |
$ oc get secret etcd-signer -n openshift-etcd -ojson | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' | oc apply -n openshift-config -f - | ||
---- | ||
|
||
. Wait for the cluster Operators to rollout and stabilize by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Wait for the cluster Operators to rollout and stabilize by running the following command: | |
. Wait for the cluster Operators to roll out and stabilize by running the following command: |
/remove-label peer-review-done |
New changes are detected. LGTM label has been removed. |
* You receive an expiration alert | ||
* The private key is leaked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per ISG: If the list items comprise only complete sentences, include a period after each sentence.
* You receive an expiration alert | |
* The private key is leaked | |
* You receive an expiration alert. | |
* The private key is leaked. |
|
||
[IMPORTANT] | ||
==== | ||
When a private key is leaked, you must rotate all of the certificates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume a standard user would know when a key is leaked and what that means....
$ oc get secret -n openshift-etcd etcd-signer -ojsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}' | ||
---- | ||
|
||
. Re-create the signer, if the remaining lifetime is close to the current date, by deleting the signer and wait for the static pod rollout by running the following commands: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that putting the lifetime phrase first ties in better with the previous step.
. Re-create the signer, if the remaining lifetime is close to the current date, by deleting the signer and wait for the static pod rollout by running the following commands: | |
. If the remaining lifetime is close to the current date, re-create the signer by deleting the signer and wait for the static pod rollout by running the following commands: |
What should the user do if the remaining lifetime is not close to the current date?
@tmalove A few nits. Otherwise LGTM. Don't forget to squash! |
17a7c13
to
bb44c86
Compare
/label merge-review-needed |
|
||
:_mod-docs-content-type: CONCEPT | ||
[id="etcd-cert-alerts-metrics-signer_{context}"] | ||
== etcd certificate rotation alerts and metrics signer certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== etcd certificate rotation alerts and metrics signer certificates | |
= etcd certificate rotation alerts and metrics signer certificates |
The first heading in a module is always an H1. Adjust the leveloffset in the assembly, if necessary.
|
||
:_mod-docs-content-type: PROCEDURE | ||
[id="rotating-certificate-authority_{context}"] | ||
== Rotating the etcd certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== Rotating the etcd certificate | |
= Rotating the etcd certificate |
The first heading in a module is always an H1. Adjust the leveloffset in the assembly, if necessary.
== Management | ||
|
||
These certificates are only managed by the system and are automatically rotated. | ||
|
||
== Services | ||
|
||
etcd certificates are used for encrypted communication between etcd member peers, as well as encrypted client traffic. The following certificates are generated and used by etcd and other processes that communicate with etcd: | ||
etcd certificates are used for encrypted communication between etcd member peers, and encrypted client traffic. The following certificates are generated and used by etcd and other processes that communicate with etcd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd certificates are used for encrypted communication between etcd member peers, and encrypted client traffic. The following certificates are generated and used by etcd and other processes that communicate with etcd: | |
etcd certificates are used for encrypted communication between etcd member peers and encrypted client traffic. The following certificates are generated and used by etcd and other processes that communicate with etcd: |
== etcd certificate rotation alerts and metrics signer certificates | ||
|
||
Two alert types inform users about pending `etcd` certificate expiration: | ||
[horizontal] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please validate that this looks ok on docs.redhat after GA.
87820e1
to
2174329
Compare
@tmalove: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/cherrypick enterprise-4.16 |
@kalexand-rh: new pull request created: #77935 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
OSDOCS-10055
Note: This PR has been reviewed, however this version includes updates from the initial peer view. Thanks!
Version(s):
4.16+
Link to docs preview: etcd certificates (Updated 6/21/2024)
QE review: