Live attach/detach of external IPs #4694

FelixMcFelix · 2023-12-14T16:22:38Z

This PR adds new endpoints to attach and detach external IPs to/from an individual instance at runtime, when instances are either stopped or started. These new endpoints are:

POST /v1/floating-ips/{floating_ip}/attach
POST /v1/floating-ips/{floating_ip}/detach
POST /v1/instances/{instance}/external-ips/ephemeral
DELETE /v1/instances/{instance}/external-ips/ephemeral

These follow and enforce the same rules as external IPs registered during instance creation: at most one ephemeral IP, and at most 32 external IPs total.

/v1/floating-ips/{floating_ip}/attach includes a kind field to account for future API resources which a FIP may be bound to -- such as internet gateways, load balancers, and services.

Interaction with other instance lifecycle changes and sagas

Both external IP modify sagas begin with an atomic update to external IP attach state conditioned on $\mathit{state}\in[ \mathit{started},\mathit{stopped}]$. As a result, we know that an external IP saga can only ever start before any other instance state change occurs. We then only need to think about how these other sagas/events must behave when called during an attach/detach, keeping in mind that these are worst-case orderings: attach/detach are likely to complete quickly.

Instance start & migrate

Both of these sagas alter an instance's functional sled ID, which controls whether NAT entry insertion and OPTE port state updates are performed. If an IP attach/detach is incomplete when either saga reaches instance_ensure_dpd_config or instance_ensure_registered (e.g., any IP associated with the target instance is in attaching/detaching state), the start/migrate will unwind with an HTTP 503.

Generally, neither should undo in practice since IP attach/detach are fast operations -- particularly when an instance is formerly stopped. This is used solely to guarantee that only one saga is accessing a given external IP at a time, and that the update target remains unchanged.

Instance stop & delete

These operations are either not sagaized (stop), or cannot unwind (delete), and so we cannot block them using IP attach state. IP attach/detach will unwind if a given sled-agent is no longer responsible for an instance. Instance delete will force-detach IP addresses bound to an instance, and if this is seen then IP attach will deliberately unwind to potentially clean up NAT state. OPTE/DPD undo operations are best-effort in such a case to prevent stuck sagas.

Instance stop and IP attach may interleave such that the latter adds additional NAT entries after other network state is cleared. Because we cannot unwind in this case, instance_ensure_dpd_config will now attempt to remove leftover conflicting RPW entries if they are detected, since we know they are a deviation from intended state.

Additional/supporting changes

Pool/floating IP specifiers in instance create now take NameOrId, parameter names changed to match.
External IP create/bind in instance create no longer double-resolves name on saga unwind.
views::ExternalIp can now contain FloatingIp body.
DPD NAT insert/remove functions now perform single rule update via ID instead of index into the EIP list -- index-based was unstable under live addition/removal.
NAT RPW ensure is now more authoritative, and will remove conflicting entries if an initial insert fails.
Pool NameOrId resolution for floating IP allocation pulled up from Datastore into Nexus.

Closes #4630 and #4628.

Still to solve: migration blocks, handling unexpt runstate change, preventing concurrent attach/detach of a given EIP. Takes the time to refactor the dpd_ensure and nat_removal so that we can target it to a single IP address on a given device.

...and now the cleanup + locking begins

Should close #4628.

a) needs further testing b) instance stop/delete need to be made state-aware

Still need to fixup start/stop/delete to block on ip-progress external IPs, but then we'll be sound!

Instance stop seems to do nothing, so that's fine -- probably need to make the attach/detach undo pass by failures to communicate with sled since we can't block it directly as-is. Delete is a bit trickier, need to see what Disk does.

Next up: putting into action my thoughts on improved idempotency.

This required that we drop the non-null parent constraint on ephemeral IPs, but I think that's worth it in the name of consistency.

gjcolombo

I took a look at most of the lifecycle interactions here. I think the interlocks around not proceeding with certain tasks if an IP is attaching/detaching mostly work, but in one case I think it's a very close shave, and I think we might need some extra checks for this logic to work correctly with live migration.

nexus/db-queries/src/db/datastore/external_ip.rs

gjcolombo · 2024-01-08T18:13:55Z

nexus/src/app/instance_network.rs

+        // This is performed so that an IP attach/detach will block the
+        // instance_start saga. Return service unavailable to indicate
+        // the request is retryable.
+        if must_all_be_attached


I haven't looked closely at how the NAT RPW actually works, but I'm a little curious as to why we need this behavior. My mental model of that saga is that it maintains a view of the NAT entries that Dendrite should know about and ensures the various dpd daemons in the system are told when that view changes. If I have an IP that's attached and an IP that's attaching at this point, could/would/should the attach saga just end up adding the entry for the IP that's still attaching at this point?

Re-ensuring an existing NAT entry for the RPW is a no-op, and should only cost a row lookup. So there's nothing wrong with always pushing all Attached IPs apart from wasted effort.

We do still require an override for at least one other entry in an arbitrary state, because of saga unwind of attach/detach: an IP might be Detaching, unwind when informing OPTE, and then need to be specifically readded to NAT. Similarly an Attaching IP might unwind at the same step, and need to be removed from NAT. As a result, we don't want other executing sagas to try to do anything with IPs in a state they don't expect or IPs they don't 'own'. We'd also need to ensure that instance_start sees "all attached", and that attach/detach don't care about other IPs' states because they can safely run concurrently for different IP objects on the same instance.

Unrelated: Looking over this again I've also seen that ensure_nat_entry is needlessly duplicated per IP, which is again a no-op. This is also indirectly duplicated today in main (instance_start: for switch in boundary_switches {... instance_ensure_dpd_config ...} -> for ip in ips { ... ensure_nat_entry ... }), but we can fix this here now.

gjcolombo · 2024-01-08T18:31:04Z

nexus/src/app/sagas/instance_ip_attach.rs

+        warn!(
+            log,
+            "siia_complete_attach: external IP was deleted or call was idempotent"
+        )


We're perilously close to needing to unwind for this in at least the following case:

instance is running

attach saga starts, gets through siia_get_instance_state, and then stops

instance stops and gets deleted; instance delete saga detaches all the external IPs

attach saga goes through siia_nat and siia_update_opte, then reaches this step

instance_ip_move_state returns false because the IP was already detached, but the changes pushed by the previous two steps don't get undone

I think that, in practice, we very narrowly avoid this because an instance can't be deleted unless it has no active VMM, and sled agent ensures that when a VMM stops, it (sled agent) removes the relevant instance from its instance manager before notifying Nexus that the VMM is gone. This means that in the case above, siia_update_opte will fail and cause the saga to unwind.

Similarly, I think this just barely works in the following case:

instance is running

attach saga starts, gets all the way through siia_update_opte

instance stops: this calls instance_delete_dpd_config, which (fortunately for us) deletes all NAT entries for the IPs of interest even if they're not fully Attached

now it's safe for the saga not to unwind since instance stop already took care of all the interesting bits

I would consider at least leaving a comment in here about why we think it's safe to ignore the "IP was Detached when we went to move it to Attached" case, but I would also strongly consider making this case unwind if possible--this is all on a knife edge and a seemingly-safe change to, say, the order of operations in sled agent's VMM cleanup path could end up breaking this.

I had intentionally designed for Case 2, but had partly overlooked Case 1 (assuming that both are stop->delete) -- thanks a lot for catching that. I agree on that unwind pathway; although I'd (maybe incorrectly) assumed that the instance would be guaranteed to be removed from InstanceManager before moving from Stopping->Stopped, since any OPTE endpoint on sled-agent will then give us Error::NoSuchInstance.

I'd also be happier to unwind here rather than to push the limits of what we can get away with. Examining it a bit more closely however, there is another issue with the Case 1 race. The NAT entry will not be removed on unwind because of instance delete -- it's no longer attached to the instance, so instance_delete_dpd_config will not see it. We need a tweak there to handle this and not wipe out a potential followup attach on another instance, whether we unwind or not.

I agree on that unwind pathway; although I'd (maybe incorrectly) assumed that the instance would be guaranteed to be removed from InstanceManager before moving from Stopping->Stopped, since any OPTE endpoint on sled-agent will then give us Error::NoSuchInstance.

Your assumption is correct--this does happen today when the VMM goes from Stopped -> Destroyed (which transition will also move its Instance from Running -> Stopped). But sled agent didn't previously guarantee this, so my fear is that someone (read: me) will inadvertently change this on the sled agent side in six months' time without considering the connection between these paths...

Okay, I've resolved this by unwinding if we observe a delete at the end of the attach saga. I've put in some extra work to clean up straggler NAT entries more precisely (needed in the delete-unwind case) and to make instance_ensure_dpd_config more aggressive in pushing routes where an old conflict may exist. I think doing both of these means more useful buffers between us and catastrophe.

nexus/src/app/instance.rs

nexus/src/app/sagas/instance_common.rs

We still need to figure out the best possible `unwind` semantic at the end of unwind, then hopefully this is buttoned up.

This also includes some extra work designed to make NAT RPW rule management a little more robust in case the IP attach sagas leave behind any mess in event of concurrent stop or delete (sorry!).

FelixMcFelix · 2024-01-12T13:47:11Z

@ahl @david-crespo On the API design front, I've written up some thoughts after the call on Tuesday into a strawman we can pick apart: https://gist.github.com/FelixMcFelix/18d20262a918ccf691a325a8d948379d. I'm less sure of the ephemeral side of things; if that needs more time in the oven, we can leave it out for now until arrive at a good conclusion there.

gjcolombo

Thanks for adding the new checks here--LGTM for the instance lifecycle pieces (I haven't looked at the external API bits at all).

gjcolombo · 2024-01-10T16:49:43Z

nexus/db-queries/src/db/datastore/external_ip.rs

+                if collection.runtime_state.migration_id.is_some() {
+                    return Err(Error::unavail(&format!(
+                        "tried to attach {kind} IP while instance was migrating: \
+                         detach will be safe to retry once migrate completes"


s/detach/attach?

gjcolombo · 2024-01-12T17:34:04Z

nexus/src/app/sagas/instance_common.rs

Love the new doc comments in here!

FelixMcFelix · 2024-01-19T19:55:10Z

nexus/tests/integration_tests/external_ips.rs

@@ -746,26 +702,6 @@ async fn test_external_ip_attach_detach_fail_if_in_use_by_other(
    .parsed_body()
    .unwrap();
    assert_eq!(error.message, "floating IP cannot be attached to one instance while still attached to another".to_string());
-
-    // Detach in-use FIP from *other* instance should fail.


This case shouldn't be hit by users anymore now that detach is targeted on the FIP itself rather than the pair (fip, instance).

FelixMcFelix · 2024-01-19T20:16:17Z

Attach and detach APIs for floating/ephemeral have been retooled according to our last meeting -- split by ephemeral/floating, and floating IP control now lives on the FIP itself to support future network API developments. If there are naming tweaks needed then they shouldn't be too onerous to make. :)

david-crespo

Didn't review implementation closely, but the APIs look good!

FelixMcFelix added 3 commits December 13, 2023 11:46

Brutally hacked together, saga-less live FIP mgmt

eddd29c

The sagaization begins

34c03d7

FelixMcFelix self-assigned this Dec 14, 2023

FelixMcFelix added 26 commits December 15, 2023 14:31

Working sagas

89acdba

...and now the cleanup + locking begins

cargo fix

5366162

Back out iff. migration in progress.

b2d08f7

Merge branch 'main' into felixmcfelix/floating-ip-live

7d6118f

Make use of check_and_update for attach/detach

81c7d25

Should close #4628.

Fixes to atomic attach/detach/delete

a0fe2fd

Merge branch 'main' into felixmcfelix/floating-ip-live

58f1760

Better revert of external IPs on sled-agent err

5bc4789

Add attach state to external IPs.

7034efd

a) needs further testing b) instance stop/delete need to be made state-aware

Bad state in detach

ce1c92d

Make use of Instance::attach_resource and friends

cc130c3

Still need to fixup start/stop/delete to block on ip-progress external IPs, but then we'll be sound!

Block instance_start while attaching/detaching

51f8bae

Instance stop seems to do nothing, so that's fine -- probably need to make the attach/detach undo pass by failures to communicate with sled since we can't block it directly as-is. Delete is a bit trickier, need to see what Disk does.

Refactor, resolve interaction with instance delete saga.

a1b558e

Add EIP state to omdb, clean up error msg on double attach/detach

c0ffb93

The great clippy appeasement

ee6a790

WIP test fixes, resume the quest for idempotency

0001eb3

Large block comment for myself/future historians

89ddffa

Test harness progress

9db79b9

Next up: putting into action my thoughts on improved idempotency.

Clippy + neuter errors in undo path

c259342

Working idempotent double attach/detach.

21d5776

This required that we drop the non-null parent constraint on ephemeral IPs, but I think that's worth it in the name of consistency.

Revalidate consumers of instance_lookup_external_ips

dcdec48

(Existing) Test fixup.

4b65a67

Accidentally forgot an 'IF NOT EXISTS'

ec3e01e

Fill out unauthorized endpoint tests.

7800b07

Merge branch 'main' into felixmcfelix/floating-ip-live

2e6f972

Additional integration tests.

24591b3

FelixMcFelix added 5 commits January 5, 2024 18:19

Fix up end-to-end tests with new IP structure.

f26aa64

Bump up schema version pre-merge

6184ecc

Merge branch 'main' into felixmcfelix/floating-ip-live

b9d783c

Banish NameOrId resolution from datastore/external-ip

73db560

Review feedback: use Nexus::ip_pool_lookup

5a916ca

gjcolombo reviewed Jan 8, 2024

View reviewed changes

FelixMcFelix added 8 commits January 10, 2024 12:16

Review feedback: nits and error messages

725efa4

Review feedback: correct lockout check with live migration

f5a50b0

Self review: missed some comments

d1519b1

Self review: comment expansion

705cb6e

Whitespace...

774c183

Unduplicate calls to ensure_nat_entry

f1cd2b1

Review feedback: tougher NAT cleanup when undoing attach

540de27

We still need to figure out the best possible `unwind` semantic at the end of unwind, then hopefully this is buttoned up.

Review feedback: unwind on concurrent delete

a7bf681

This also includes some extra work designed to make NAT RPW rule management a little more robust in case the IP attach sagas leave behind any mess in event of concurrent stop or delete (sorry!).

FelixMcFelix requested a review from gjcolombo January 11, 2024 21:12

gjcolombo approved these changes Jan 12, 2024

View reviewed changes

FelixMcFelix added 5 commits January 19, 2024 13:04

Review feedback: begin work on separate FIP + EIP endpoints

b288320

Move schema in prep for merge

6c2bdf0

Merge branch 'main' into felixmcfelix/floating-ip-live

baa62f7

Add separate ephemeral IP manipulation endpoint

42afeda

Excise instance/external-ip/attach + detach

5a4614c

FelixMcFelix commented Jan 19, 2024

View reviewed changes

Remove autogen'd file

ee146c2

FelixMcFelix requested a review from ahl January 19, 2024 20:16

david-crespo approved these changes Jan 24, 2024

View reviewed changes

FelixMcFelix merged commit cc64304 into main Jan 24, 2024
21 checks passed

FelixMcFelix deleted the felixmcfelix/floating-ip-live branch January 24, 2024 21:05

karencfv mentioned this pull request Jan 28, 2024

User specified SSH keys to inject at instance create time #4764

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live attach/detach of external IPs #4694

Live attach/detach of external IPs #4694

FelixMcFelix commented Dec 14, 2023 •

edited

Loading

gjcolombo left a comment

gjcolombo Jan 8, 2024

FelixMcFelix Jan 10, 2024

gjcolombo Jan 8, 2024

FelixMcFelix Jan 10, 2024 •

edited

Loading

gjcolombo Jan 10, 2024

FelixMcFelix Jan 11, 2024

FelixMcFelix commented Jan 12, 2024 •

edited

Loading

gjcolombo left a comment

gjcolombo Jan 10, 2024

gjcolombo Jan 12, 2024

FelixMcFelix Jan 19, 2024 •

edited

Loading

FelixMcFelix commented Jan 19, 2024

david-crespo left a comment

Live attach/detach of external IPs #4694

Live attach/detach of external IPs #4694

Conversation

FelixMcFelix commented Dec 14, 2023 • edited Loading

Interaction with other instance lifecycle changes and sagas

Instance start & migrate

Instance stop & delete

Additional/supporting changes

gjcolombo left a comment

Choose a reason for hiding this comment

gjcolombo Jan 8, 2024

Choose a reason for hiding this comment

FelixMcFelix Jan 10, 2024

Choose a reason for hiding this comment

gjcolombo Jan 8, 2024

Choose a reason for hiding this comment

FelixMcFelix Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

gjcolombo Jan 10, 2024

Choose a reason for hiding this comment

FelixMcFelix Jan 11, 2024

Choose a reason for hiding this comment

FelixMcFelix commented Jan 12, 2024 • edited Loading

gjcolombo left a comment

Choose a reason for hiding this comment

gjcolombo Jan 10, 2024

Choose a reason for hiding this comment

gjcolombo Jan 12, 2024

Choose a reason for hiding this comment

FelixMcFelix Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

FelixMcFelix commented Jan 19, 2024

david-crespo left a comment

Choose a reason for hiding this comment

FelixMcFelix commented Dec 14, 2023 •

edited

Loading

FelixMcFelix Jan 10, 2024 •

edited

Loading

FelixMcFelix commented Jan 12, 2024 •

edited

Loading

FelixMcFelix Jan 19, 2024 •

edited

Loading