Initial support for migrating VM to a new propolis instance. #447

luqmana · 2021-11-29T21:07:29Z

Building on the live migration work in propolis (oxidecomputer/propolis#69), this adds the nexus endpoint to trigger a migration of an instance.

smklein

This is a pretty awesome PR - from HTTP endpoints, to sagas, to sled calls into the Propolis server, this is an impressively cross-cutting PR, which was ready way faster than I would have expected. Nice job getting this together!

smklein · 2021-11-30T01:23:16Z

nexus/src/nexus.rs

+        // TODO correctness TODO robustness TODO design
+        // Should we lookup the instance again here?
+        // See comment in project_create_instance.
+        let instance = self.db_datastore.instance_fetch(&instance.id()).await?;


(not actionable)

So unlike project_create_instance, we do already have an instance object we fetched back on line 869.
We could plausibly save a copy of it in-memory, and then return it "as we expect it to be transformed" back to the user.

I dunno if that presents a different race condition compared to the one described in project_create -- there, it seemed like we wanted to return a version of the object "exactly as it was created" (which might race with concurrent ops immediately after creation, but before query), and here, we'd want to return a copy of the instance "precisely after it was moved" (which might also race with concurrent ops before or after the migration itself).

Dunno. Seems like we'll either need to lock the DB row somehow, or define the bounds of "what could be seen as a return value from this endpoint" more explicitly.

Yea, I kinda went back and forth over whether I should even bother returning the Instance back here. I think it'll become clearer once we have the actual system (i.e. not oxapi_demo) in place that will be using this endpoint.

This could plausibly have a similar solution to #447 (comment) - basically, identify that the instance is "migrating", prevent concurrent operations if that flag is set, and atomically "fetch + update" to complete the migration and grab the state of the instance afterwards.

I would expect this API would (eventually) be async (and this question would be moot) -- is that right?

@davepacheco as in, "the caller would have to issue a second call to fetch the instance, and we'd just return whatever currently exists in the DB to them", right?

nexus/src/sagas.rs

sled-agent/src/instance.rs

sled-agent/src/instance_manager.rs

smklein · 2021-11-30T06:00:15Z

nexus/src/sagas.rs

+    let dst_sa = osagactx
+        .sled_client(&dst_sled_uuid)
+        .await
+        .map_err(ActionError::action_failed)?;


To be clear, the order of operations in this saga step is:

Fetch the instance from DB

Send "ensure + migrate" request to the destination instance, using the metadata for the instance we got in (1).

Update the "runtime state" in the DB

Is it possible we run into a TOCTTOU issue here?

What if someone modifies the sled (perhaps sends a different ensure request, gets a response) in between our "fetch" and "ensure" requests? Wouldn't we send an out-of-date sled request?

I think this problem could be avoided with some combination of:

Durable state changes (explicitly modifying the DB, indicating that the instance is in the progress of migration, and preventing concurrent modification until this flag is removed).

Using a generation number to only perform an operation if we explicitly ensure we haven't raced with anyone else.

https://rfd.shared.oxide.computer/rfd/0192#_conditional_updates seems related

I think this problem still exists, right?

Seems like this was resolved by moving out "migration prep" to an earlier saga step, to set the "migration" state and block other operations which depend on the state/

Yup, we ask sled agent to idempotently update the state to migrating (via Nexus) as part of an earlier action. Only if that action succeeds will we move on to the subsequent actions which does the actual migration.

smklein · 2021-12-01T15:43:44Z

sled-agent/src/instance_manager.rs

+                        // the same ID, that would inadvertantly remove our newly created
+                        // instance instead.
+                        // TODO: cleanup source instance properly
+                        std::mem::forget(old_instance);


Sorry for backpedaling, but I think I'm seeing your perspective from #447 (comment) here... maybe the "null ticket" wouldn't be so bad.

I have a minor beef with this call to forget - we intend to drop the ticket, but forgetting the entire instance may also ignore other drop functions that could need to execute. I don't think those exist today, but there isn't anything preventing them from existing.

I think that justifies moving it out of the old instance, rather than creating a new ticket.

Something like:

// Ensure that the instance inserted into the `instances` mapping has a tracking // membership ticket, so it can be safely removed when it goes out-of-scope. let ticket = { if let Some(old_instance) = old_instance { // If we migrate an instance within the sled, we should keep it alive in the // `instances` mapping. Although the Propolis ID changes, the user-visible // instance ID remains the same, and uses the same key. assert!(migrate.is_some()); std::mem::replace(old_instance.running_state.ticket, InstanceTicket::null()); } else { InstanceTicket::new(instance_id, self.inner.clone()) } };

I have a minor beef with this call to forget - we intend to drop the ticket, but forgetting the entire instance may also ignore other drop functions that could need to execute. I don't think those exist today, but there isn't anything preventing them from existing.

Well, long-term I don't plan to just forget here :P Probably need to keep it around in case we need to rollback from a recoverable failure or just for cleaning up once the actual migration is done.

Bumping this for visibility - It looks like we're still calling forget here; wasn't sure if we wanted to patch prior to merging.

Yea, this still needs to be addressed but I planned on doing it in a followup PR. It will change slightly as I plumb more the propolis-side migration work which relies on both instances being available.

nexus/src/sagas.rs

ahl · 2021-12-02T00:26:58Z

if you either git checkout Cargo.lock or cargo update -p dropshot you should be all good... at least vis-a-vis this openapi issue (sorry!)

davepacheco

It's really cool to see this come together!

davepacheco · 2021-12-02T16:23:12Z

nexus/src/nexus.rs

+        // TODO correctness TODO robustness TODO design
+        // Should we lookup the instance again here?
+        // See comment in project_create_instance.
+        let instance = self.db_datastore.instance_fetch(&instance.id()).await?;


I would expect this API would (eventually) be async (and this question would be moot) -- is that right?

nexus/src/sagas.rs

smklein

So my comments are mostly focused on:

Concurrency safety
Idempotency safety

I don't wanna drag the lifetime of this PR out too far, but I do think we should try to track these issues before submitting. If you wanna push back on any of my comments, I'd be more okay with a reduced set of functionality (e.g., "no attaching / detaching devices during a migration!") than features which are potentially buggy (e.g., "You can attach/detach devices during a migration, but it may result in race conditions").

nexus/src/external_api/http_entrypoints.rs

smklein · 2022-01-13T16:14:14Z

nexus/src/nexus.rs

+        // TODO correctness TODO robustness TODO design
+        // Should we lookup the instance again here?
+        // See comment in project_create_instance.
+        let instance = self.db_datastore.instance_fetch(&instance.id()).await?;


@davepacheco as in, "the caller would have to issue a second call to fetch the instance, and we'd just return whatever currently exists in the DB to them", right?

nexus/src/sagas.rs

smklein · 2022-01-13T17:40:52Z

nexus/src/sagas.rs

+    }
+
+    // Update the Instance's runtime state to indicate it's currently migrating.
+    // This also acts as a lock somewhat to prevent any further state changes


How is this so? E.g., we aren't prevented from adding/removing disks/nics during this time, are we?

Nexus currently has a check based on the current instance state to decide whether certain operations are allowed (like rebooting it). I extended that check to also disallow such changes during migration. But you're right, those checks don't currently exist for modifying disks or nics. But, last I checked those operations didn't actually modify the instance yet (e.g. SledAgent::sisk_ensure is just a todo!).

smklein · 2022-01-13T17:53:32Z

nexus/src/sagas.rs

+    let dst_sa = osagactx
+        .sled_client(&dst_sled_uuid)
+        .await
+        .map_err(ActionError::action_failed)?;


I think this problem still exists, right?

smklein · 2022-01-13T18:00:07Z

sled-agent/src/instance_manager.rs

+                    // Instance does not exist or one does but we're performing
+                    // a intra-sled migration. Either way - create an instance


What if "ensure" has been called repeatedly for this migration? I'm trying to eyeball how this function can still be idempotent if the following behavior occurs:

"Ensure" called, requesting a migration. Instance now exists.

"Ensure" called again, while the migration is in-progress, because a saga step is being called repeatedly.

It seems like this results in creating a brand new propolis instance, and terminating the old instance, no? Couldn't this potentially interrupt migration in progress, by basically "throwing away the destination" instance and creating it again?

(Maybe this is intended behavior, but I figured I'd check - kinda seems like we'd want to leave the destination alone, or potentially just alter attached devices, if it is already communicating with the source instance)

I guess, I'd probably also need to pass along both the src & dst UUIDs in the migrate request here so that we can determine if the migration already happened? Admittedly the idempotent story is a little ill-defined so far and is definitely something I need to audit more fine-grainedly.

Yeah, this seems possible to infer a handful of ways:

If the "source_propolis" UUID == "destination_propolis" UUID

If the migration UUID was seen before

I do still think it's important to handle though, otherwise, the behavior of tossing the old instance would disrupt the migration.

Ok, updated it such that if the current instance's propolis UUID matches the target propolis UUID, we treat it as an ongoing migration and not spin up a new instance.

nexus/src/sagas.rs

smklein

Looks good, modulo one last minor idempotency thing

nexus/src/nexus.rs

sled-agent/src/common/instance.rs

…d uuid.

…mon code to InstanceInner::ensure.

…th start.

…andle both.

…evisit whether we want to serialize the Instance model as a whole separately.

…le migrating.

…'t idempotent.

…rialization errors.

…tance state.

… of doing it in nexus saga.

… saga.

…gent.

… to try to be idempotent.

smklein

Thanks for considering all these cases!

luqmana requested a review from smklein November 29, 2021 21:07

luqmana force-pushed the live-migrate branch from 74da061 to 07e5922 Compare November 29, 2021 21:40

smklein reviewed Nov 30, 2021

View reviewed changes

luqmana force-pushed the live-migrate branch 2 times, most recently from 64963a6 to 7f61111 Compare December 1, 2021 06:01

smklein reviewed Dec 1, 2021

View reviewed changes

nexus/src/sagas.rs Outdated Show resolved Hide resolved

luqmana force-pushed the live-migrate branch 2 times, most recently from 368821c to 472ccf1 Compare December 1, 2021 21:46

davepacheco reviewed Dec 2, 2021

View reviewed changes

luqmana force-pushed the live-migrate branch 2 times, most recently from c103b07 to c64b209 Compare January 6, 2022 23:02

smklein reviewed Jan 13, 2022

View reviewed changes

luqmana commented Jan 13, 2022

View reviewed changes

nexus/src/sagas.rs Outdated Show resolved Hide resolved

smklein reviewed Jan 21, 2022

View reviewed changes

nexus/src/nexus.rs Outdated Show resolved Hide resolved

nexus/src/nexus.rs Outdated Show resolved Hide resolved

nexus/src/nexus.rs Show resolved Hide resolved

sled-agent/src/common/instance.rs Outdated Show resolved Hide resolved

luqmana added 14 commits January 24, 2022 18:44

Initial support for migrating VM to a new propolis instance.

badba9b

Remove sled alloc from migration saga and always use provided dst sle…

2e989ef

…d uuid.

Remove some duplication between Instance::start/migrate by moving com…

b1df0c6

…mon code to InstanceInner::ensure.

Add clarifying comment and use cfg_attr instead of not(test) attribute.

00da004

Remove old TODO.

b2df10d

Don't drop NICs before migration is complete.

bb70704

Move comment up.

537a2f6

Simplify migrate by just always creating new instance and unifying wi…

1e11657

…th start.

Update propolis rev.

9d190d9

Remove distinct templates for inter/intra sled migration as one can h…

cd816f7

…andle both.

Use constant for propolis port.

01b8779

Remove TODO and leave instance lookup in migrate saga as-is. We can r…

ce758f5

…evisit whether we want to serialize the Instance model as a whole separately.

Mark Instance state as migrating to prevent other runtime changes whi…

75c0af8

…le migrating.

Update openapi/nexus.json

8d50194

luqmana added 14 commits January 24, 2022 18:49

Use POST request instead of PUT for nexus' migrate endpoint as it isn…

c680284

…'t idempotent.

Add migration id to instance schema.

b51a6fb

Use migration id as part of saga to strive towards idempotency.

3f1ea57

Dedup instance_update_runtime calls during migration saga.

bb3953a

Thread migration id through to propolis.

9a58b86

Return external::Error under ActionError::action_failed to avoid dese…

99671c9

…rialization errors.

Make sure to set time_updated and gen appropriately when updating ins…

708dd9b

…tance state.

Leave sled-agent in charge of updating instance runtime state instead…

73cfcb0

… of doing it in nexus saga.

Just use instance_id instead of project_id + instance_name in migrate…

8e65536

… saga.

Switch to snake case to conform to openapi-lint

7eb293e

consistency

ab65ea1

Make sure placing instance in migrating state is idempotent in sled a…

3806618

…gent.

Don't interrupt an ongoing migration if a duplicate request is issued…

23624a2

… to try to be idempotent.

Preserve existing instance clone semantics to fix mocked tests ordering.

bcfc5ae

luqmana force-pushed the live-migrate branch from 5762665 to bcfc5ae Compare January 25, 2022 03:16

Add missing tags for nexus migrate endpoint.

d2588eb

smklein approved these changes Jan 25, 2022

View reviewed changes

luqmana merged commit 4d12a4d into main Jan 25, 2022

luqmana deleted the live-migrate branch January 25, 2022 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for migrating VM to a new propolis instance. #447

Initial support for migrating VM to a new propolis instance. #447

luqmana commented Nov 29, 2021

smklein left a comment

smklein Nov 30, 2021

luqmana Dec 1, 2021

smklein Dec 1, 2021

davepacheco Dec 2, 2021

smklein Jan 13, 2022

smklein Nov 30, 2021

smklein Jan 13, 2022

smklein Jan 21, 2022

luqmana Jan 24, 2022

smklein Dec 1, 2021

luqmana Dec 1, 2021

smklein Jan 21, 2022

luqmana Jan 24, 2022

ahl commented Dec 2, 2021

davepacheco left a comment

davepacheco Dec 2, 2021

smklein left a comment

smklein Jan 13, 2022

smklein Jan 13, 2022

luqmana Jan 13, 2022

smklein Jan 13, 2022

smklein Jan 13, 2022

luqmana Jan 20, 2022

smklein Jan 21, 2022

luqmana Jan 25, 2022

smklein left a comment

smklein left a comment

		// Instance does not exist or one does but we're performing
		// a intra-sled migration. Either way - create an instance

Initial support for migrating VM to a new propolis instance. #447

Initial support for migrating VM to a new propolis instance. #447

Conversation

luqmana commented Nov 29, 2021

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahl commented Dec 2, 2021

davepacheco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein left a comment

Choose a reason for hiding this comment

smklein left a comment

Choose a reason for hiding this comment