Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable instance create saga to create a new disk #1873

Merged
merged 26 commits into from
Nov 2, 2022
Merged

Conversation

zephraph
Copy link
Contributor

@zephraph zephraph commented Oct 25, 2022

Fixes #812

Wires up the instance disk create flow by directly appending the disk create saga as a sub saga. I also corrected the behavior of the attach action by ensuring it doesn't spawn a saga in its action but rather just updates the relevant data in the datastore. There's a TODO to make that functionality more robust, but I covered everything that's handled currently by instance attach as defined in nexus... which admittedly isn't much.

While I was in here I also removed some of the unnecessary behavior around looping through the max number of disks and passing the index down as apart of the saga params. Now the disk create or attach params themselves are passed down and we only loop as many times as is needed. This was an artifact of having static sagas previously.

There's still more cleanup to do for NICs/external IPs which I haven't tackled and of course more robust tests are required. I'm going to leave the latter up to @smklein as he's been working on that for disk create already.

@zephraph zephraph marked this pull request as ready for review October 25, 2022 20:24
@zephraph zephraph requested a review from smklein October 25, 2022 20:24
nexus/src/app/sagas/instance_create.rs Outdated Show resolved Hide resolved
nexus/src/app/sagas/instance_create.rs Outdated Show resolved Hide resolved
nexus/src/app/sagas/instance_create.rs Outdated Show resolved Hide resolved
nexus/src/app/sagas/instance_create.rs Outdated Show resolved Hide resolved
Comment on lines -182 to +177
subsaga_builder: steno::DagBuilder,
subsaga_dag: steno::Dag,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On @andrewjstone's recommendation I updated this function to take a dag instead of a builder. It's easy enough to call the builder at the append callsite and this lets us reuse the disk create saga's make_saga_dag function.

@zephraph zephraph marked this pull request as draft October 27, 2022 03:15
@zephraph zephraph marked this pull request as ready for review October 27, 2022 17:46
Comment on lines +749 to +754
let (.., authz_disk, _db_disk) = LookupPath::new(&opctx, &datastore)
.project_id(project_id)
.disk_name(&disk_name)
.fetch()
.await
.map_err(ActionError::action_failed)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really shouldn't be looking up disks by name here, but that's all we have. Technically I could do a special lookup for disks that were previously created and grab their IDs, but for disks that are only being attached all we have is what the API provides.

This is a bit of an aside to this issue, but I believe that all names should be resolved inside of http_entrypoints. Once calls flow further down in the system the should be operating solely on IDs. Whenever we begin to implement RFD-322 we should probably consider that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 is that not worth doing here? If not, I wonder if we should keep the comment that was here before (the TODO-security TODO-correctness one that references omicron#1536).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can re-add the comment. I still generally believe that we shouldn't just handle this in a one-off way, but more as a holistic solution. I know that's a non-trivial amount of work, but I still believe it to be important.

@zephraph zephraph requested a review from smklein October 27, 2022 17:52
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zephraph This looks great! It's really nice to see you creating just enough saga nodes necessary to create and attach disks, and not the MAX. I only had one nit and one question.

This is probably safe to merge, although, I'd like a few others to review. Presumably you tested this all locally and you can now create disks during instance creation :)

nexus/src/app/sagas/instance_create.rs Outdated Show resolved Hide resolved
}
if attached {
datastore
.instance_attach_disk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, as I'm not super familiar with this code: Why change the call from using osagactx.nexus().instance_XXX to datastore.instance_XXX?

It seems you basically duplicated the methods from nexus/src/app/instance.rs

Copy link
Contributor Author

@zephraph zephraph Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a misunderstanding on my part. I'd mistakenly thought that functions off of osagactx.nexus() spawned sagas when that's clearly just an implementation detail. I'll change it back to what it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a correction to that: nexus().instance_attach_disk isn't the best form. It takes names and does lookups which are resolved before calling the datastore method. You're right that we should reuse instance_attach_disk in /nexus/src/app/instance but ideally we'd have a different form that takes pre-resolved arguments like the datastore method.

Ideally we want to resolve as many names as possible at the very start of a saga (or better, before the saga is invoked).

Do folks think I should change it back or is this form acceptable until we're able to follow up with a better interface for attaching? cc @smklein, @davepacheco.

This sort of plays into my previous comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'd personally err on the side of "doing less name lookups" - the boundary between nexus and datastore is really thin from the point-of-view of sagas. The datastore itself should provide necessary encapsulation for safe DB access, but forcing sagas to go through nexus isn't adding much IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Yes, I think at some point we discussed having the Nexus-level functions accept LookupPath objects rather than "names" so that we could use them from places that had ids or names.
  • Yes, I think sagas should use the datastore freely when that's the right level of abstraction.
  • At the same time, it might also be appropriate to use the Nexus-level functions when there's real application-level logic that has to happen. For example, I think it'd be better for sagas that need to update instance state to use the app-level instance_set_runtime(). In this specific case, it seems like the app-level wrapper doesn't do anything but the name resolution and then call the datastore function. So it seems fine.

Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at the CI failures? I'm not sure if they're flakes, or related to this PR

nexus/src/app/sagas/instance_create.rs Show resolved Hide resolved
}
if attached {
datastore
.instance_attach_disk(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'd personally err on the side of "doing less name lookups" - the boundary between nexus and datastore is really thin from the point-of-view of sagas. The datastore itself should provide necessary encapsulation for safe DB access, but forcing sagas to go through nexus isn't adding much IMO.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Such a huge improvement, not just to what's here but the API and (I expect) the console, too.

I know @smklein is still working on examples for doing unit tests of sagas. But I think we could still have end-to-end tests that exercise a few key cases here:

  • an instance create that creates a disk, attaches another disk, and succeeds
  • an instance create that creates one disk, attaches another disk, and fails...we should at least verify after this that the created disk was destroyed and the other disk is detached. It'd be nice if we can make sure it fails in a way that did create the first disk and/or attach the other disk so that we're exercising the undo actions.

Even if we also had unit tests already I think we'd still want these basic end-to-end tests.

Comment on lines +749 to +754
let (.., authz_disk, _db_disk) = LookupPath::new(&opctx, &datastore)
.project_id(project_id)
.disk_name(&disk_name)
.fetch()
.await
.map_err(ActionError::action_failed)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 is that not worth doing here? If not, I wonder if we should keep the comment that was here before (the TODO-security TODO-correctness one that references omicron#1536).

}
if attached {
datastore
.instance_attach_disk(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Yes, I think at some point we discussed having the Nexus-level functions accept LookupPath objects rather than "names" so that we could use them from places that had ids or names.
  • Yes, I think sagas should use the datastore freely when that's the right level of abstraction.
  • At the same time, it might also be appropriate to use the Nexus-level functions when there's real application-level logic that has to happen. For example, I think it'd be better for sagas that need to update instance state to use the app-level instance_set_runtime(). In this specific case, it seems like the app-level wrapper doesn't do anything but the name resolution and then call the datastore function. So it seems fine.

@zephraph
Copy link
Contributor Author

Have you looked at the CI failures? I'm not sure if they're flakes, or related to this PR

I've looked at them, yeah. A snapshot integration test is timing out. I'm trying to dig into what's going on there.

@zephraph
Copy link
Contributor Author

an instance create that creates one disk, attaches another disk, and fails...we should at least verify after this that the created disk was destroyed and the other disk is detached. It'd be nice if we can make sure it fails in a way that did create the first disk and/or attach the other disk so that we're exercising the undo actions.

@davepacheco I think the last test I added tests this case sufficiently. It pre-creates two disks, one of which is faulted. It creates the boot disk, attaches the first good disk, and then tries to attach the faulted disk which results in the saga failing. The test tries to ensure that the same number of disks exist in the expected state post cleanup.

@zephraph zephraph force-pushed the instance-disk-create branch from 5ddb1ff to 445cda7 Compare October 31, 2022 21:29
@zephraph zephraph requested a review from andrewjstone October 31, 2022 22:56
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - only a couple minor suggestions for the tests!

nexus/tests/integration_tests/instances.rs Outdated Show resolved Hide resolved
nexus/tests/integration_tests/instances.rs Outdated Show resolved Hide resolved
nexus/tests/integration_tests/instances.rs Outdated Show resolved Hide resolved
@zephraph zephraph force-pushed the instance-disk-create branch from 1e67336 to 5f618ef Compare November 1, 2022 19:51
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have a comment on the tests saying what they're testing. Either way, this looks good!

Comment on lines +1959 to +1964
let builder =
RequestBuilder::new(client, http::Method::POST, &url_instances)
.body(Some(&instance_params))
.expect_status(Some(http::StatusCode::CREATED));

let response = NexusRequest::new(builder)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace most of this with let response = NexusRequest::objects_post(client, &url_instances, &instance_params)) (it's equivalent -- not better or worse aside from less code)

@zephraph zephraph enabled auto-merge (squash) November 2, 2022 05:07
@zephraph zephraph merged commit 2b3a643 into main Nov 2, 2022
@zephraph zephraph deleted the instance-disk-create branch November 2, 2022 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement disk creation during instance creation
5 participants