-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable instance create saga to create a new disk #1873
Conversation
subsaga_builder: steno::DagBuilder, | ||
subsaga_dag: steno::Dag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On @andrewjstone's recommendation I updated this function to take a dag instead of a builder. It's easy enough to call the builder at the append callsite and this lets us reuse the disk create saga's make_saga_dag
function.
let (.., authz_disk, _db_disk) = LookupPath::new(&opctx, &datastore) | ||
.project_id(project_id) | ||
.disk_name(&disk_name) | ||
.fetch() | ||
.await | ||
.map_err(ActionError::action_failed)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really shouldn't be looking up disks by name here, but that's all we have. Technically I could do a special lookup for disks that were previously created and grab their IDs, but for disks that are only being attached all we have is what the API provides.
This is a bit of an aside to this issue, but I believe that all names should be resolved inside of http_entrypoints
. Once calls flow further down in the system the should be operating solely on IDs. Whenever we begin to implement RFD-322 we should probably consider that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 is that not worth doing here? If not, I wonder if we should keep the comment that was here before (the TODO-security TODO-correctness one that references omicron#1536).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can re-add the comment. I still generally believe that we shouldn't just handle this in a one-off way, but more as a holistic solution. I know that's a non-trivial amount of work, but I still believe it to be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zephraph This looks great! It's really nice to see you creating just enough saga nodes necessary to create and attach disks, and not the MAX. I only had one nit and one question.
This is probably safe to merge, although, I'd like a few others to review. Presumably you tested this all locally and you can now create disks during instance creation :)
} | ||
if attached { | ||
datastore | ||
.instance_attach_disk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, as I'm not super familiar with this code: Why change the call from using osagactx.nexus().instance_XXX
to datastore.instance_XXX
?
It seems you basically duplicated the methods from nexus/src/app/instance.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was a misunderstanding on my part. I'd mistakenly thought that functions off of osagactx.nexus()
spawned sagas when that's clearly just an implementation detail. I'll change it back to what it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a correction to that: nexus().instance_attach_disk
isn't the best form. It takes names and does lookups which are resolved before calling the datastore method. You're right that we should reuse instance_attach_disk
in /nexus/src/app/instance
but ideally we'd have a different form that takes pre-resolved arguments like the datastore method.
Ideally we want to resolve as many names as possible at the very start of a saga (or better, before the saga is invoked).
Do folks think I should change it back or is this form acceptable until we're able to follow up with a better interface for attaching? cc @smklein, @davepacheco.
This sort of plays into my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I'd personally err on the side of "doing less name lookups" - the boundary between nexus and datastore is really thin from the point-of-view of sagas. The datastore itself should provide necessary encapsulation for safe DB access, but forcing sagas to go through nexus isn't adding much IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- Yes, I think at some point we discussed having the Nexus-level functions accept
LookupPath
objects rather than "names" so that we could use them from places that had ids or names. - Yes, I think sagas should use the datastore freely when that's the right level of abstraction.
- At the same time, it might also be appropriate to use the Nexus-level functions when there's real application-level logic that has to happen. For example, I think it'd be better for sagas that need to update instance state to use the app-level instance_set_runtime(). In this specific case, it seems like the app-level wrapper doesn't do anything but the name resolution and then call the datastore function. So it seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at the CI failures? I'm not sure if they're flakes, or related to this PR
} | ||
if attached { | ||
datastore | ||
.instance_attach_disk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I'd personally err on the side of "doing less name lookups" - the boundary between nexus and datastore is really thin from the point-of-view of sagas. The datastore itself should provide necessary encapsulation for safe DB access, but forcing sagas to go through nexus isn't adding much IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Such a huge improvement, not just to what's here but the API and (I expect) the console, too.
I know @smklein is still working on examples for doing unit tests of sagas. But I think we could still have end-to-end tests that exercise a few key cases here:
- an instance create that creates a disk, attaches another disk, and succeeds
- an instance create that creates one disk, attaches another disk, and fails...we should at least verify after this that the created disk was destroyed and the other disk is detached. It'd be nice if we can make sure it fails in a way that did create the first disk and/or attach the other disk so that we're exercising the undo actions.
Even if we also had unit tests already I think we'd still want these basic end-to-end tests.
let (.., authz_disk, _db_disk) = LookupPath::new(&opctx, &datastore) | ||
.project_id(project_id) | ||
.disk_name(&disk_name) | ||
.fetch() | ||
.await | ||
.map_err(ActionError::action_failed)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 is that not worth doing here? If not, I wonder if we should keep the comment that was here before (the TODO-security TODO-correctness one that references omicron#1536).
} | ||
if attached { | ||
datastore | ||
.instance_attach_disk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- Yes, I think at some point we discussed having the Nexus-level functions accept
LookupPath
objects rather than "names" so that we could use them from places that had ids or names. - Yes, I think sagas should use the datastore freely when that's the right level of abstraction.
- At the same time, it might also be appropriate to use the Nexus-level functions when there's real application-level logic that has to happen. For example, I think it'd be better for sagas that need to update instance state to use the app-level instance_set_runtime(). In this specific case, it seems like the app-level wrapper doesn't do anything but the name resolution and then call the datastore function. So it seems fine.
I've looked at them, yeah. A snapshot integration test is timing out. I'm trying to dig into what's going on there. |
@davepacheco I think the last test I added tests this case sufficiently. It pre-creates two disks, one of which is faulted. It creates the boot disk, attaches the first good disk, and then tries to attach the faulted disk which results in the saga failing. The test tries to ensure that the same number of disks exist in the expected state post cleanup. |
5ddb1ff
to
445cda7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - only a couple minor suggestions for the tests!
1e67336
to
5f618ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have a comment on the tests saying what they're testing. Either way, this looks good!
let builder = | ||
RequestBuilder::new(client, http::Method::POST, &url_instances) | ||
.body(Some(&instance_params)) | ||
.expect_status(Some(http::StatusCode::CREATED)); | ||
|
||
let response = NexusRequest::new(builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can replace most of this with let response = NexusRequest::objects_post(client, &url_instances, &instance_params))
(it's equivalent -- not better or worse aside from less code)
Fixes #812
Wires up the instance disk create flow by directly appending the disk create saga as a sub saga. I also corrected the behavior of the attach action by ensuring it doesn't spawn a saga in its action but rather just updates the relevant data in the datastore. There's a TODO to make that functionality more robust, but I covered everything that's handled currently by
instance attach
as defined in nexus... which admittedly isn't much.While I was in here I also removed some of the unnecessary behavior around looping through the max number of disks and passing the index down as apart of the saga params. Now the disk create or attach params themselves are passed down and we only loop as many times as is needed. This was an artifact of having static sagas previously.
There's still more cleanup to do for NICs/external IPs which I haven't tackled and of course more robust tests are required. I'm going to leave the latter up to @smklein as he's been working on that for disk create already.