-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ereport] index ereport ingestion by SP type/slot #7903
Draft
hawkw
wants to merge
21
commits into
main
Choose a base branch
from
eliza/ereport-sp-api
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, the initial ereport ingestion API I added in #7833 proposed a single dropshot API that would be implemented by both sled-agent and MGS. This was possible because the initial design would have indexed all ereport producers (reporters) by a UUID. However, per recent conversations with @cbiffle and @jgallagher, we've determined that Nexus will instead request ereports from service processors indexed by SP physical topology (e.g. type and slot), like the rest of the MGS HTTP API. Therefore, we can no longer have a single HTTP API for ereporters that's implemented by both MGS and sled-agents, and instead, SP ereport ingestion should be a new endpoint on the MGS API. This commit does that, moving the ereport query params into `ereport-types`, eliminating the separate `ereport-api` and `ereport-client` crates, and adding an ereport-ingestion-by-SP-location endpoint to the management gateway API.
I'm not 100% sure what our disposition on merging this ought to be, as it does add an API to the MGS |
I'm turning this into a draft as I'm going to keep using this branch to hack up the ereport protocol types a bit more. |
this should make it easier for MGS to blindly turn CBOR into JSON. we can enforce more structure when deserialing.
hawkw
added a commit
that referenced
this pull request
Apr 4, 2025
It turns out that our Git dependency on the oxidecomputer/management-gateway-service repo hasn't been updated in... a while. We're currently on a commit from September of last year, oxidecomputer/management-gateway-service@9bbac47. This branch updates it to the current HEAD commit, oxidecomputer/management-gateway-service@f9566e6. The only changes in MGS that required code changes in Omicron are: - oxidecomputer/management-gateway-service#291, where I added a new `MeasurementKind` for AMD CPU T<sub>ctl</sub> values (which are not temperatures in degrees Celcius, but a secret third thing). - oxidecomputer/management-gateway-service#316 by @mkeeter, adding the interface to read SP task dumps over the network. Since this adds methods to the `sp_impl::SpHandler` trait, the SP simulator implementations need to be updated, or else they will no longer compile. For now, I've just made these `unimplemented!()`, as we're not currently actually _using_ them. In my PR #7903 implementing ereport ingestion from SPs, I had to make these changes as part of changing the MGS dependency to pull in the new `gateway-sp-comms` code for ereports. Since this isn't actually related, and is just necessary to update the Git dep, I figured I'd pull that commit (49973ae) into its own PR.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the initial ereport ingestion API I added in #7833 proposed
a single dropshot API that would be implemented by both sled-agent and
MGS. This was possible because the initial design would have indexed all
ereport producers (reporters) by a UUID. However, per recent
conversations with @cbiffle and @jgallagher, we've determined that Nexus
will instead request ereports from service processors indexed by SP
physical topology (e.g. type and slot), like the rest of the MGS HTTP
API. Therefore, we can no longer have a single HTTP API for ereporters
that's implemented by both MGS and sled-agents, and instead, SP ereport
ingestion should be a new endpoint on the MGS API.
This commit does that, moving the ereport query params into
ereport-types
, eliminating the separateereport-api
andereport-client
crates, and adding an ereport-ingestion-by-SP-locationendpoint to the management gateway API.
Furthermore, there are some terminology changes. The ereport
protocol has a value which we've variously referred to as an "instance
ID", a "generation ID", and a "restart nonce", all of which have
unfortunate name collisions that are potentially confusing or just
unpleasant. We've agreed to refer to this value everywhere as a
"restart ID", so this commit also changes that.