Add docs about backwards compat design #7085

andrewjstone · 2024-11-15T22:36:18Z

No description provided.

davepacheco

Thanks for writing this. I agree with all the content here. I wonder if we can organize it more usefully so that it's easier to skim to find the areas of relevance to the reader in that moment. Honestly, I'm just afraid people won't really read the wall of text start-to-finish. As an example, the first two paragraphs cover stuff we explicitly don't expect people to need to think about. One idea is just to put some section headers in, like "Network APIs made with Progenitor/Dropshot", "Network APIs with custom protocols over TCP", "Database state", "Other ad hoc persistent state", etc. Another idea would be to lead with the specific guidelines and relegate the rest to a "Background" section that people could read if they want to better understand how to apply them or where they came from.

That said, having this as-is is better than not having it so if that all seems annoying then I'd say just go ahead and land this. I think section headers would be good bang-for-the-buck.

davepacheco · 2024-11-22T21:44:19Z

docs/control-plane-architecture.adoc

@@ -205,6 +205,24 @@ It's essential that components provide visibility into what they're doing for de
 * Sagas are potentially long-lived.  Without needing any per-saga work, the saga log provides detailed information about which steps have run, which steps are in-progress, and the results of each step that completed.
 * Background tasks are continuous processes.  They can provide whatever detailed status they want to, including things like: activity counters, error counters, ringbuffers of recent events, data produced by the task, etc.  These can be viewed with `omdb`.

+=== Backwards Compatibility
+
+In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therfore can utilize the same strategy. Some other services, such as trust quroum and crucible operate over TCP with custom protocols, and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.


Suggested change

In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therfore can utilize the same strategy. Some other services, such as trust quroum and crucible operate over TCP with custom protocols, and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.

In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therefore can utilize the same strategy. Some other services, such as trust quroum and Crucible, operate over TCP with custom protocols and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.

davepacheco · 2024-11-22T21:58:15Z

docs/control-plane-architecture.adoc

+guidelines for how to do this consistently and safely in the future.
+
+1. Ensure the code to perfom an upgrade / backfill is in one location. This makes it easier to find and remove once it is no longer needed. It also makes it easier to test in isolation, and to understand the complete change.
+2. We currently operate in a mupdate driven world. For the time being we should not perform arbitrary backfilling during normal operation of the system. Instead, after a mupdate/schema migration  we should force a blueprint step that performs the backfilling before doing anything else. Once that is done, all other nexus code can operate as if there never was any backfilling. This keeps the bulk of the code streamlined and allows the update code to be self maintained. It's possible we may want to make this a first class part of reconfigurator for when we enter the online update world.


I think we can make this a more more general rule. This is basically what I did in #4466, where I created a new ledger for the new format. On startup, sled agent would read the old one and write the new one and then from then on it would only ever look at the new one. This is the same principle you're describing here -- for all the same reasons and benefits -- but for the Ledger data you mentioned a few paragraphs above.

I think the general principle is something like: when doing a migration from old-format stuff to new-format stuff, prefer to do it up front during some kind of startup operation so that the rest of the system can operate only in the new world. (This may require being pretty careful that the conversion won't fail, since if it does, the system won't come up. I was pretty paranoid about that with #4466 and took a lot of steps to make sure that wouldn't happen (e.g., fetched data from every system I could get my hands on and ran it through the conversion and added a lot of it to the test suite too).)

andrewjstone · 2024-11-26T00:13:56Z

@davepacheco I took a stab at your suggestions in 219fab9

Add docs about backwards compat design

213da73

andrewjstone requested review from davepacheco and jgallagher November 15, 2024 22:36

davepacheco approved these changes Nov 22, 2024

View reviewed changes

review comments

219fab9

andrewjstone enabled auto-merge (squash) November 26, 2024 00:15

andrewjstone merged commit 8674bfc into main Nov 26, 2024
16 checks passed

andrewjstone deleted the docs-backwards-compat branch November 26, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docs about backwards compat design #7085

Add docs about backwards compat design #7085

andrewjstone commented Nov 15, 2024

davepacheco left a comment

davepacheco Nov 22, 2024

davepacheco Nov 22, 2024

andrewjstone commented Nov 26, 2024

Add docs about backwards compat design #7085

Add docs about backwards compat design #7085

Conversation

andrewjstone commented Nov 15, 2024

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Nov 22, 2024

Choose a reason for hiding this comment

davepacheco Nov 22, 2024

Choose a reason for hiding this comment

andrewjstone commented Nov 26, 2024