-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Implement rotation of service-account signing key (take two) #11204
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
633c721
to
3965d44
Compare
dc77313
to
db3fb2d
Compare
6894f49
to
893995e
Compare
One issue with this approach will be the effect on a downgrade of the kops version. Unless I take contortions with the on-disk Any recommendations @justinsb ? A related question is when do we want to drop support for downgrading kops to versions before 1.18? If we drop that support we can stop writing the legacy format files into the keystore. |
I believe this is most of what is needed. I still need to do some manual testing before removing the WIP. I should also add a release note. |
I like that with this, one can explicitly move the primary marker. Relying on the highest serial number causes that one issue with premature exporting of admin credentials. I see you also change the pool logic, so this bit looks fully compatible with my rotate command. One common use case for rotating CA is to deal with leavers that had access to the secret store. Since we preemptively provision next here, wouldn't people then have to do double rotation? A similar issue would be if one has to rotate when the CA expires. Both of these concerns are specific to CAs though. My PR creates "next" as part of the rotation command. I wonder if not that could be done here? It is nice to do this as part of a fitask, but still ... I didn't see the logic for deleting the older keys (docs says all keys but the primary will be removed), but based on your warning, I assume running the command twice in short succession would cause disruption. My PR mitigates that scenario by the command itself rotating the cluster, then moving the primary marker. |
To invalidate all credentials that existed at time T one would have to do three rotations. One to move next into primary, another to move primary to previous, and another to remove previous. It would be possible to have an alternative procedure: delete the "next" key, propagate out a newly created "next" key, wait, rotate, wait, delete the "previous" key, and propagate the deletion. Creating "next" as part of fitasks simplifies the code. After the rotation command one needs to apply_cluster, so there's little reason to duplicate the logic there. The older keys are deleted by the rotate command, by dint of not including them in the new I do not like the rotate command doing rolling updates, especially not before making further changes. As I had mentioned before, that risks failures that would then be hard to recover from. The canonical example would be if the machine running kops abruptly dies, for example during one of those rolling updates. |
If rolls are aborted mid-run, you have say 1 api server that trusts both new and old cert, 2 that only trust old CA. On the second rotate, it is similar, but user connects to with the new cert and can talk to all 3. I don't think being aborted causes unrecoverable failures with this setup. if you restart an aborted rotation command, it won't provision any new keys, and then start rotating again. Only inconvenience there is one has to rotate both time regardless. The first rotation could be dropped using your primary ID feature, as the new key would be promoted only after the first run, when we know all nodes trust it. |
/retest |
@johngmyers: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@olemarkus I've tried to put the reasons for my discomfort with an all-in-one-go rotation command in my review of #10516. I can see it could be challenging trying to converge on a UI with a discussion that is scattered over multiple PRs. We could try creating an issue covering rotation in general, but that reminds me of a certain comic strip concerning standards. I see the benefit of deferring the creation and staging of the new key until after the admin decides that rotation is needed. That would suggest separate commands or subcommands for staging the new key, changing the primary, and distrusting the old key. For something as simple as the service-account key, it's roughly the same amount of work to just do three rotations and this PR's approach of pre-staging the new key is simpler when having a regular rotation schedule. For the cluster CA, on the other hand, having separate stage, rotate, and distrust operations would only require distributing |
In short, I am happy with a command for distrusting a key. I am thinking a new flag on the keyset for that, then the builder/fitask just need to check that in addition when deciding if one should provision next or not. We document the full recipe for how to rotate the cluster and my full rotate command then just becomes copy pasting that entire recipe.
How does that sound? |
Originally I was thinking distrusting would be deleting it from the keystore (followed by an apply and rolling update) but if we're to eventually move to more versioned state we might want to keep the old keys in the store marked with a distrusted bit. I think the sequence would be something like:
|
I think that looks about right. I guess But yeah, think we are aligned on the approach and we are down to just naming things :) |
We could probably just go with |
@johngmyers: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Drops support for downgrading to kops versions before 1.18. Doing so will cause all keypairs to revert to the state they were prior to upgrading kops.
Downgrading to a kops version before 1.21 can cause a disruptive rotation of the service account token signing key.