Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs about backwards compat design #7085

Merged
merged 2 commits into from
Nov 26, 2024
Merged

Conversation

andrewjstone
Copy link
Contributor

No description provided.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this. I agree with all the content here. I wonder if we can organize it more usefully so that it's easier to skim to find the areas of relevance to the reader in that moment. Honestly, I'm just afraid people won't really read the wall of text start-to-finish. As an example, the first two paragraphs cover stuff we explicitly don't expect people to need to think about. One idea is just to put some section headers in, like "Network APIs made with Progenitor/Dropshot", "Network APIs with custom protocols over TCP", "Database state", "Other ad hoc persistent state", etc. Another idea would be to lead with the specific guidelines and relegate the rest to a "Background" section that people could read if they want to better understand how to apply them or where they came from.

That said, having this as-is is better than not having it so if that all seems annoying then I'd say just go ahead and land this. I think section headers would be good bang-for-the-buck.

@@ -205,6 +205,24 @@ It's essential that components provide visibility into what they're doing for de
* Sagas are potentially long-lived. Without needing any per-saga work, the saga log provides detailed information about which steps have run, which steps are in-progress, and the results of each step that completed.
* Background tasks are continuous processes. They can provide whatever detailed status they want to, including things like: activity counters, error counters, ringbuffers of recent events, data produced by the task, etc. These can be viewed with `omdb`.

=== Backwards Compatibility

In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therfore can utilize the same strategy. Some other services, such as trust quroum and crucible operate over TCP with custom protocols, and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therfore can utilize the same strategy. Some other services, such as trust quroum and crucible operate over TCP with custom protocols, and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.
In general, backwards compatibility between services will be provided at the API level as described in <<rfd421>>. Most internal control plane service APIs are Dropshot based and therefore can utilize the same strategy. Some other services, such as trust quroum and Crucible, operate over TCP with custom protocols and have their own mechanisms for backwards compatibility. The introduction of new services of this type should be largely unnecessary for the foreseeable future.

guidelines for how to do this consistently and safely in the future.

1. Ensure the code to perfom an upgrade / backfill is in one location. This makes it easier to find and remove once it is no longer needed. It also makes it easier to test in isolation, and to understand the complete change.
2. We currently operate in a mupdate driven world. For the time being we should not perform arbitrary backfilling during normal operation of the system. Instead, after a mupdate/schema migration we should force a blueprint step that performs the backfilling before doing anything else. Once that is done, all other nexus code can operate as if there never was any backfilling. This keeps the bulk of the code streamlined and allows the update code to be self maintained. It's possible we may want to make this a first class part of reconfigurator for when we enter the online update world.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this a more more general rule. This is basically what I did in #4466, where I created a new ledger for the new format. On startup, sled agent would read the old one and write the new one and then from then on it would only ever look at the new one. This is the same principle you're describing here -- for all the same reasons and benefits -- but for the Ledger data you mentioned a few paragraphs above.

I think the general principle is something like: when doing a migration from old-format stuff to new-format stuff, prefer to do it up front during some kind of startup operation so that the rest of the system can operate only in the new world. (This may require being pretty careful that the conversion won't fail, since if it does, the system won't come up. I was pretty paranoid about that with #4466 and took a lot of steps to make sure that wouldn't happen (e.g., fetched data from every system I could get my hands on and ran it through the conversion and added a lot of it to the test suite too).)

@andrewjstone
Copy link
Contributor Author

@davepacheco I took a stab at your suggestions in 219fab9

@andrewjstone andrewjstone enabled auto-merge (squash) November 26, 2024 00:15
@andrewjstone andrewjstone merged commit 8674bfc into main Nov 26, 2024
16 checks passed
@andrewjstone andrewjstone deleted the docs-backwards-compat branch November 26, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants