-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User specified SSH keys to inject at instance create time #4764
Conversation
We want the names to match with the resource that it refers to Co-authored-by: Charlie Park <charlie@charliepark.org>
Co-authored-by: Benjamin Leonard <hello@benleonard.co.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really looking forward to this feature! Thanks a bunch for taking the time to flesh it out so thoroughly
// If the keys are None, use the fallback behavior of assigning all the users keys | ||
None => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this not confuse users thinking that None
means add no keys? Perhaps we may want to revisit why the default behaviour is to add all keys.
There are a few reasons we may want to reconsider this default behaviour:
- From a security standpoint, people may have forgotten to remove keys from their user account that have been compromised.
- Additionally, users may not be aware that
None
means all, it doesn't feel particularly intuitive (often people don't read the docs!). They may end up adding keys they did not intend to add, and be unaware of it if the instance was not created through the console. - Do we expect our customers to particularly want to have all keys automatically added to the instances they create?
I think it'd also be good to have some input from @ahl and/or @askfongjojo on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if users have more than 10 keys saved in their user account, every attempt at creating an instance with the default behaviour of adding all keys would result in an error.
This means that someone with more than 10 keys would have to set which keys they want added every single time to avoid a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yeah, I've had a lot of conversations about this. @iliana and I discussed this in chat. This implementation was xer original suggestion.
There's a few things to consider here.
- Should this be a required field? If it's required then it's one extra thing folks have to specify when creating an instance. Originally I was worried about the repercussions of this decision (and others like it) on the CLI. That's mostly moot given that most args have to be provided via a json file. If it's not required, I think the only sane default is to send them all (as we do today). Having the default mean none get sent would lead to workflow situations where users forget and then lock themselves out (which is worse in my mind than having to go and remove some keys manually after the fact)
- Should we even have the option for users to send all keys to the instance? I feel like yes is probably the answer here because I suspect most users will have few keys. That's also another reason I've leaned towards this behavior as the default. If we didn't preserve this option then users would always have to specify keys by name/id in the API (lest they can't SSH into the instance) and that feels like a pretty big DX regression. That said, I would appreciate more feedback from folks on this.
So, yeah, an explicit enum having AllFromProfile | Only[keys] | NoKeys
is certainly an option I can implement in this PR if that's the consensus. I do see the value of it, especially around forcing users to make a choice. If the choice that they're forced to make is often the same as the default of this PR though, that feels not great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like breaking old clients. Requiring an explicit enum will be a flag day for any existing code. This is, I think, an extremely good candidate for being our first version bump on an API. We did give ourselves a number of backstops to be able to declare a new version of the API, but we haven't used any of them yet (as far as I'm aware?), so that sort of explodes the work needed to be done here.
If we want to land this in the immediate term, we should keep the implicit behavior to use all of the user's keys if the field is not present (i.e. None
). But ideally we have a clearer v2 either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments both! These are my thoughts:
Should this be a required field? If it's required then it's one extra thing folks have to specify when creating an instance. Originally I was worried about the repercussions of this decision (and others like it) on the CLI. That's mostly moot given that most args have to be provided via a json file. If it's not required, I think the only sane default is to send them all (as we do today). Having the default mean none get sent would lead to workflow situations where users forget and then lock themselves out (which is worse in my mind than having to go and remove some keys manually after the fact)
To be perfectly honest, I don't see this being a required field as a problem. IMO it's better to be explicit about what you're doing than get surprises down the line. Alternatively, if we don't want this to be a required field, we could also have a concept of a "default" SSH key which gets added to any instance where SSH keys haven't been defined. WDYT?
We also need to think of the use case where new users are migrating their workloads from their previous vendor. For example: a user that didn't have support for adding SSH keys at create time (cough VMware cough) will have had to either have a user data init script that creates authentication, or have an image with a predefined user, and add additional SSH keys later. In this scenario, users may want to initially conserve their current workflows as part of a big lift and shift operation. These migrations are no small thing. and we really need to think about how to make it as seamless as possible for our customers to migrate over without unintended consequences.
Automatically adding every SSH key the user has saved feels quite nuclear, and can lead to easily giving unintended access. Especially if you're in the type or role where your job is to set up infrastructure for others to use and have saved other people's public keys in your account.
FWIW, I don't see being locked out of your instance as a necessarily terrible thing. This is something the user would immediately notice and would only need to delete and recreate their instance.
Should we even have the option for users to send all keys to the instance? I feel like yes is probably the answer here because I suspect most users will have few keys. That's also another reason I've leaned towards this behavior as the default. If we didn't preserve this option then users would always have to specify keys by name/id in the API (lest they can't SSH into the instance) and that feels like a pretty big DX regression. That said, I would appreciate more feedback from folks on this.
I wouldn't necessarily say most users would have a small amount of keys, it really depends. Also, users that manage infrastructure will have ssh keys of others. So it's not just how many keys are saved, but also to whom those keys are giving access to. Frankly, at this point I don't think we have enough information about what most users will be doing. To me the big questions for now are:
- Does anyone really need to have every key they have saved added to their instance on create?
- What is more annoying? Adding a flag to define ssh keys on instance create, or having to go into an instance and remove SSH keys that you realised you really shouldn't have in there later down the track.
- Which repercussions are worse? Unintentionally giving access to keys that shouldn't have this access (could be compromised keys the user forgot to remove or people who shouldn't have access), or having to recreate an instance because you forgot to define a key and got locked out (this could be mitigated by having a "default" ssh key).
Additionally, by automatically adding all keys we'd be making it harder for our users to follow the principle of least privilege
I really don't like breaking old clients. Requiring an explicit enum will be a flag day for any existing code. This is, I think, an extremely good candidate for being our first version bump on an API. We did give ourselves a number of backstops to be able to declare a new version of the API, but we haven't used any of them yet (as far as I'm aware?), so that sort of explodes the work needed to be done here.
As for the concern about breaking APIs, I think the consensus is that now is the moment to be breaking the API to make sure we get it right. Posting a recent comment here -> #4694 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on breaking the API and including no keys if none are defined.
However making it required might be a poor user experience in the CLI. If the user doesn't want to include keys, they shouldn't need that argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate all the discussion here. @karencfv you bring up a lot of great points and I feel like there's just ultimately some more product discovery we need to do here.
I have higher priorities than this PR and I've already sank too much time into it. I'm just going to create an issue to encapsulate the context here and we can solve this in a follow up. The default behavior as implemented is the same as what we have today so it shouldn't be disruptive to include in this release while also giving us the time we need to figure out how we want this to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the follow up issue to tackle this work: #4919
|
||
// Gather the SSH public keys of the actor make the request so | ||
// that they may be injected into the new image via cloud-init. | ||
// TODO-security: this should be replaced with a lookup based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above. Can you open an issue to track this and reference it here so this isn't forgotten about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add this TODO, just copied it over from where it was originally. I don't actually know what the original intent is here and I somewhat assume it's out of date. @davepacheco, I think you originally wrote this. Can you give any more context? I'm happy to file an issue, but I'd just like to understand more what the replacement should be (if it's even still relevant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zowie, that was a while ago; thanks, @davepacheco for digging up that PR (#974). That TODO is mentioned in its description:
Right now, we collect only the keys belonging to the actor making the request. Once role assignments for Silo users are in place, we can ask for, e.g., all the keys belonging to all the users with an appropriate role (vms.instance.create, probably) on the instance.
I don't remember exactly where that desire came from, other than generally wanting to allow customers to assign permissions at the role/group level rather than for individual users (as the current lookup would require); but we didn't have any kind of roles at the time, so couldn't do it then. If it's still desirable and we now have something like Silo/IdP roles (I've been away from that layer of Omicron for a while, so don't really know), it could certainly be spun out into an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @plotnick! Do you mind opening up an issue for this?
/// Note that this list is a snapshot in time and will not reflect updates made after | ||
/// the instance is created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like this clarification
* Entries are added here when an instance is created (with configured SSH keys) | ||
* and removed when the instance is destroyed. | ||
* | ||
* TODO: Should this have time created / time deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting here since the PR is set as ready to review. This is intended to be added in this PR yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, but not necessarily. Currently I have this relationship configured to be hard deleted when the instance is (soft) deleted. This was a weird case being that it's both a join and not something a user can directly change. It's created at instance create and deleted at instance delete.
If I do need to add the deleted timestamp, I certainly will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha! nw, I just wanted to make sure this wasn't something that was intended to be included in this PR
Co-authored-by: Karen Cárcamo <karencfv@users.noreply.github.com>
Co-authored-by: Alex Plotnick <alex@oxidecomputer.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating follow up issues for product discovery for instance SSH keys. I'm OK with revising all of that at a later date, and merging this PR (once the tests pass) for now so we can introduce new functionality in this release.
Thanks again for taking the time to look into this!
|
||
// Gather the SSH public keys of the actor make the request so | ||
// that they may be injected into the new image via cloud-init. | ||
// TODO-security: this should be replaced with a lookup based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @plotnick! Do you mind opening up an issue for this?
lines | ||
.trim() | ||
.lines() | ||
.find(|line| line.contains(public_key_str.clone().as_str())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct. This section of code is trying to extract the host's SSH key from the console output, so we can authenticate the host when we try to connect to it. That is not the same key as the one we created and sent to the instance to authenticate our user.
If we are failing to extract a host SSH key from the console output, either we aren't waiting long enough or something else is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was extracting a key, but the key it was extracting was wrong. I thought that for some reason there were multiple.
Looking at the deploy test run for 88ddaa6, the failure does not implicate the host key parsing. The relevant snippet of stdout:
This implicates this code: omicron/end-to-end-tests/src/instance_launch.rs Lines 148 to 159 in 88ddaa6
Which implies the connection was established, the host key was authenticated, and what failed to occur was authenticating with our private key. If we extract the
However, that is the wrong JSON key. cloud-init expects
The changes to |
Well, I'm not finding any, so I might just have to blog about it... |
Fixes #3056
This PR adds an optional
ssh_keys
parameter to the instance create payload. If it's empty the same behavior as we have today is retained: all ssh keys from a user's profile are transferred over. If a vector of ssh key name or IDs are passed then only transfers over those given keys. If an empty vector is passed no ssh keys will be transferred.There's also a
/v1/instances/<instance>/ssh-public-keys
API that lists all public keys that were sent via cloud-init to the instance at instance create. I talked to @iliana a bit about the proper response to this API. Currently it returns the fullSshKey
view. We may want to alter that to transmit fewer details.Other changes:
Remaining Work
Followup Work
Tasks
ssh_key
tossh_public_key
#4866