Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable instance create saga to create a new disk #1873

Merged
merged 26 commits into from
Nov 2, 2022
Merged
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a3dd93c
Start wiring up disk-create in instance-create
zephraph Oct 25, 2022
09e6d87
Attach disk after its created
zephraph Oct 25, 2022
0b0466d
Flatten attach
zephraph Oct 25, 2022
0085424
Maybe fix create enum
zephraph Oct 25, 2022
4bbefcb
Merge branch 'main' into instance-disk-create
zephraph Oct 25, 2022
59a7dc6
Implement undo
zephraph Oct 25, 2022
db23c77
Merge branch 'main' into instance-disk-create
zephraph Oct 27, 2022
f8c0a5a
Update subsaga_append to take a dag instead of builder
zephraph Oct 27, 2022
09447c2
Wire up disk create subsaga
zephraph Oct 27, 2022
cf1fc5b
Fix lint error
zephraph Oct 27, 2022
c1bf2da
Update fn name to be singular to clarify it attaches 1 disk
zephraph Oct 27, 2022
9126223
Improve attach/detaching to not create a saga as a node
zephraph Oct 27, 2022
d66965c
Clean up attach iteration logic
zephraph Oct 27, 2022
5da7014
Pass direct references down instead of disk array
zephraph Oct 27, 2022
6dcaa27
Fix disk create subsaga params; clean up comments
zephraph Oct 27, 2022
8d0c3ce
Appease ye almighty clippy
zephraph Oct 27, 2022
3927efe
Update attach params shape to include serialized_athn; project_id
zephraph Oct 27, 2022
033f583
Address nit
zephraph Oct 28, 2022
be534b9
Add TODO on correctness of disk lookup by name
zephraph Oct 28, 2022
d21ce3e
instance_id isn't in the subsaga context, pass it down instead
zephraph Oct 31, 2022
d68e99b
Fix clippy failures
zephraph Oct 31, 2022
1a673a4
Add integration test to create and attach disks via instance create
zephraph Oct 31, 2022
d3bbae0
Add test for undos of creates/attaches for failed instance create saga
zephraph Oct 31, 2022
445cda7
Reduce disk sizes to hopefully resolve zpool failures
zephraph Oct 31, 2022
5f618ef
Address PR feedback on tests
zephraph Nov 1, 2022
3636fd1
Comment what test is validating
zephraph Nov 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 88 additions & 123 deletions nexus/src/app/sagas/instance_create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use super::{NexusActionContext, NexusSaga, SagaInitError, ACTION_GENERATE_ID};
use crate::app::sagas::disk_create::{self, SagaDiskCreate};
use crate::app::sagas::NexusAction;
use crate::app::{
MAX_DISKS_PER_INSTANCE, MAX_EXTERNAL_IPS_PER_INSTANCE,
Expand All @@ -17,6 +18,7 @@ use crate::{authn, authz, db};
use chrono::Utc;
use lazy_static::lazy_static;
use nexus_defaults::DEFAULT_PRIMARY_NIC_NAME;
use nexus_types::external_api::params::InstanceDiskAttachment;
use omicron_common::api::external::Error;
use omicron_common::api::external::Generation;
use omicron_common::api::external::IdentityMetadataCreateParams;
Expand Down Expand Up @@ -61,12 +63,11 @@ struct NetParams {
new_id: Uuid,
}

// The disk-related nodes get a similar treatment, but the data they need are
// different.
#[derive(Debug, Deserialize, Serialize)]
struct DiskParams {
saga_params: Params,
which: usize,
struct DiskAttachParams {
serialized_authn: authn::saga::Serialized,
project_id: Uuid,
attach_params: InstanceDiskAttachment,
}

// instance create saga: actions
Expand Down Expand Up @@ -103,15 +104,10 @@ lazy_static! {
sic_allocate_instance_external_ip,
sic_allocate_instance_external_ip_undo,
);
static ref CREATE_DISKS_FOR_INSTANCE: NexusAction = ActionFunc::new_action(
"instance-create.create-disks-for-instance",
sic_create_disks_for_instance,
sic_create_disks_for_instance_undo,
);
static ref ATTACH_DISKS_TO_INSTANCE: NexusAction = ActionFunc::new_action(
"instance-create.attach-disks-to-instance",
sic_attach_disks_to_instance,
sic_attach_disks_to_instance_undo,
sic_attach_disk_to_instance,
sic_attach_disk_to_instance_undo,
);
static ref INSTANCE_ENSURE: NexusAction = new_action_noop_undo(
"instance-create.instance-ensure",
Expand All @@ -134,7 +130,6 @@ impl NexusSaga for SagaInstanceCreate {
registry.register(Arc::clone(&*CREATE_NETWORK_INTERFACE));
registry.register(Arc::clone(&*CREATE_SNAT_IP));
registry.register(Arc::clone(&*CREATE_EXTERNAL_IP));
registry.register(Arc::clone(&*CREATE_DISKS_FOR_INSTANCE));
registry.register(Arc::clone(&*ATTACH_DISKS_TO_INSTANCE));
registry.register(Arc::clone(&*INSTANCE_ENSURE));
}
Expand Down Expand Up @@ -179,7 +174,7 @@ impl NexusSaga for SagaInstanceCreate {
// Helper function for appending subsagas to our parent saga.
fn subsaga_append<S: Serialize>(
node_basename: &'static str,
subsaga_builder: steno::DagBuilder,
subsaga_dag: steno::Dag,
Comment on lines -182 to +178
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On @andrewjstone's recommendation I updated this function to take a dag instead of a builder. It's easy enough to call the builder at the append callsite and this lets us reuse the disk create saga's make_saga_dag function.

parent_builder: &mut steno::DagBuilder,
params: S,
which: usize,
Expand All @@ -198,7 +193,7 @@ impl NexusSaga for SagaInstanceCreate {
let output_name = format!("{}{}", node_basename, which);
parent_builder.append(Node::subsaga(
output_name.as_str(),
subsaga_builder.build()?,
subsaga_dag,
params_node_name,
));
Ok(())
Expand Down Expand Up @@ -243,7 +238,7 @@ impl NexusSaga for SagaInstanceCreate {
));
subsaga_append(
"network_interface",
subsaga_builder,
subsaga_builder.build()?,
&mut builder,
repeat_params,
i,
Expand Down Expand Up @@ -281,36 +276,55 @@ impl NexusSaga for SagaInstanceCreate {
));
subsaga_append(
"external_ip",
subsaga_builder,
subsaga_builder.build()?,
&mut builder,
repeat_params,
i,
)?;
}

// See the comment above where we add nodes for creating NICs. We use
// the same pattern here.
for i in 0..(MAX_DISKS_PER_INSTANCE as usize) {
let disk_params =
DiskParams { saga_params: params.clone(), which: i };
// Appends the disk create saga as a subsaga directly to the instance create builder.
for (i, disk) in params.create_params.disks.iter().enumerate() {
smklein marked this conversation as resolved.
Show resolved Hide resolved
if let InstanceDiskAttachment::Create(create_disk) = disk {
let subsaga_name =
SagaName::new(&format!("instance-create-disk-{i}"));
let subsaga_builder = DagBuilder::new(subsaga_name);
let params = disk_create::Params {
serialized_authn: params.serialized_authn.clone(),
project_id: params.project_id,
create_params: create_disk.clone(),
};
subsaga_append(
"create_disk",
SagaDiskCreate::make_saga_dag(&params, subsaga_builder)?,
&mut builder,
params,
i,
)?;
}
}

// Attaches all disks included in the instance create request, including those which were previously created
// by the disk create subsagas.
for (i, disk_attach) in params.create_params.disks.iter().enumerate() {
let subsaga_name =
SagaName::new(&format!("instance-create-disk{i}"));
SagaName::new(&format!("instance-attach-disk-{i}"));
let mut subsaga_builder = DagBuilder::new(subsaga_name);
subsaga_builder.append(Node::action(
"create_disk_output",
format!("CreateDisksForInstance-{i}").as_str(),
CREATE_DISKS_FOR_INSTANCE.as_ref(),
));
subsaga_builder.append(Node::action(
"attach_disk_output",
format!("AttachDisksToInstance-{i}").as_str(),
ATTACH_DISKS_TO_INSTANCE.as_ref(),
));
zephraph marked this conversation as resolved.
Show resolved Hide resolved
let params = DiskAttachParams {
serialized_authn: params.serialized_authn.clone(),
project_id: params.project_id,
attach_params: disk_attach.clone(),
};
subsaga_append(
"disk",
subsaga_builder,
"attach_disk",
subsaga_builder.build()?,
&mut builder,
disk_params,
params,
i,
)?;
}
Expand Down Expand Up @@ -704,50 +718,13 @@ async fn sic_allocate_instance_external_ip_undo(
Ok(())
}

/// Create disks during instance creation, and return a list of disk names
// TODO implement
async fn sic_create_disks_for_instance(
sagactx: NexusActionContext,
) -> Result<Option<String>, ActionError> {
let disk_params = sagactx.saga_params::<DiskParams>()?;
let saga_params = disk_params.saga_params;
let disk_index = disk_params.which;
let saga_disks = &saga_params.create_params.disks;

if disk_index >= saga_disks.len() {
return Ok(None);
}

let disk = &saga_disks[disk_index];

match disk {
params::InstanceDiskAttachment::Create(_create_params) => {
return Err(ActionError::action_failed(
"Creating disk during instance create unsupported!".to_string(),
));
}

_ => {}
}

Ok(None)
}

/// Undo disks created during instance creation
// TODO implement
async fn sic_create_disks_for_instance_undo(
_sagactx: NexusActionContext,
) -> Result<(), anyhow::Error> {
Ok(())
}

async fn sic_attach_disks_to_instance(
async fn sic_attach_disk_to_instance(
sagactx: NexusActionContext,
) -> Result<(), ActionError> {
ensure_instance_disk_attach_state(sagactx, true).await
}

async fn sic_attach_disks_to_instance_undo(
async fn sic_attach_disk_to_instance_undo(
sagactx: NexusActionContext,
) -> Result<(), anyhow::Error> {
Ok(ensure_instance_disk_attach_state(sagactx, false).await?)
Expand All @@ -758,64 +735,52 @@ async fn ensure_instance_disk_attach_state(
attached: bool,
) -> Result<(), ActionError> {
let osagactx = sagactx.user_data();
let disk_params = sagactx.saga_params::<DiskParams>()?;
let saga_params = disk_params.saga_params;
let disk_index = disk_params.which;
let opctx =
OpContext::for_saga_action(&sagactx, &saga_params.serialized_authn);

let saga_disks = &saga_params.create_params.disks;
let instance_name =
db::model::Name(saga_params.create_params.identity.name);
let params = sagactx.saga_params::<DiskAttachParams>()?;
let datastore = osagactx.datastore();
let opctx = OpContext::for_saga_action(&sagactx, &params.serialized_authn);
let instance_id = sagactx.lookup::<Uuid>("instance_id")?;
zephraph marked this conversation as resolved.
Show resolved Hide resolved
let project_id = params.project_id;

if disk_index >= saga_disks.len() {
return Ok(());
}
let disk_name = match params.attach_params {
InstanceDiskAttachment::Create(create_params) => {
db::model::Name(create_params.identity.name)
}
InstanceDiskAttachment::Attach(attach_params) => {
db::model::Name(attach_params.name)
}
};

let disk = &saga_disks[disk_index];
let (.., authz_instance, _db_instance) =
LookupPath::new(&opctx, &datastore)
.instance_id(instance_id)
.fetch()
.await
.map_err(ActionError::action_failed)?;

// TODO-correctness TODO-security It's not correct to re-resolve the
// organization and project names now. See oxidecomputer/omicron#1536.
let organization_name: db::model::Name =
saga_params.organization_name.clone().into();
let project_name: db::model::Name = saga_params.project_name.clone().into();

match disk {
params::InstanceDiskAttachment::Create(_) => {
// TODO grab disks created in sic_create_disks_for_instance
return Err(ActionError::action_failed(Error::invalid_request(
"creating disks while creating an instance not supported",
)));
}
params::InstanceDiskAttachment::Attach(instance_disk_attach) => {
let disk_name: db::model::Name =
instance_disk_attach.name.clone().into();

if attached {
osagactx
.nexus()
.instance_attach_disk(
&opctx,
&organization_name,
&project_name,
&instance_name,
&disk_name,
)
.await
} else {
osagactx
.nexus()
.instance_detach_disk(
&opctx,
&organization_name,
&project_name,
&instance_name,
&disk_name,
)
.await
}
// disk name now. See oxidecomputer/omicron#1536.
let (.., authz_disk, _db_disk) = LookupPath::new(&opctx, &datastore)
.project_id(project_id)
.disk_name(&disk_name)
.fetch()
.await
.map_err(ActionError::action_failed)?;
Comment on lines +764 to +769
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really shouldn't be looking up disks by name here, but that's all we have. Technically I could do a special lookup for disks that were previously created and grab their IDs, but for disks that are only being attached all we have is what the API provides.

This is a bit of an aside to this issue, but I believe that all names should be resolved inside of http_entrypoints. Once calls flow further down in the system the should be operating solely on IDs. Whenever we begin to implement RFD-322 we should probably consider that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 is that not worth doing here? If not, I wonder if we should keep the comment that was here before (the TODO-security TODO-correctness one that references omicron#1536).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can re-add the comment. I still generally believe that we shouldn't just handle this in a one-off way, but more as a holistic solution. I know that's a non-trivial amount of work, but I still believe it to be important.


if attached {
datastore
.instance_attach_disk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, as I'm not super familiar with this code: Why change the call from using osagactx.nexus().instance_XXX to datastore.instance_XXX?

It seems you basically duplicated the methods from nexus/src/app/instance.rs

Copy link
Contributor Author

@zephraph zephraph Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a misunderstanding on my part. I'd mistakenly thought that functions off of osagactx.nexus() spawned sagas when that's clearly just an implementation detail. I'll change it back to what it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a correction to that: nexus().instance_attach_disk isn't the best form. It takes names and does lookups which are resolved before calling the datastore method. You're right that we should reuse instance_attach_disk in /nexus/src/app/instance but ideally we'd have a different form that takes pre-resolved arguments like the datastore method.

Ideally we want to resolve as many names as possible at the very start of a saga (or better, before the saga is invoked).

Do folks think I should change it back or is this form acceptable until we're able to follow up with a better interface for attaching? cc @smklein, @davepacheco.

This sort of plays into my previous comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'd personally err on the side of "doing less name lookups" - the boundary between nexus and datastore is really thin from the point-of-view of sagas. The datastore itself should provide necessary encapsulation for safe DB access, but forcing sagas to go through nexus isn't adding much IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Yes, I think at some point we discussed having the Nexus-level functions accept LookupPath objects rather than "names" so that we could use them from places that had ids or names.
  • Yes, I think sagas should use the datastore freely when that's the right level of abstraction.
  • At the same time, it might also be appropriate to use the Nexus-level functions when there's real application-level logic that has to happen. For example, I think it'd be better for sagas that need to update instance state to use the app-level instance_set_runtime(). In this specific case, it seems like the app-level wrapper doesn't do anything but the name resolution and then call the datastore function. So it seems fine.

&opctx,
&authz_instance,
&authz_disk,
MAX_DISKS_PER_INSTANCE,
)
.await
.map_err(ActionError::action_failed)?;
} else {
datastore
.instance_detach_disk(&opctx, &authz_instance, &authz_disk)
.await
.map_err(ActionError::action_failed)?;
}
}

Ok(())
Expand Down