-
Notifications
You must be signed in to change notification settings - Fork 3
Add ComponentDetails::LastPostCode
#451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// possible types contained in a component details message. Each TLV-encoded | ||
| /// struct corresponds to one of these cases. | ||
| /// | ||
| /// As such, it is not part of the explicit message versioning scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this; I was planning to go through things to confirm but see you already did :)
hawkw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me! i wonder somewhat if there is anything we ought to do to explicitly indicate that this is a POST code from a SP5 CPU, but i suppose the rest of the inventory data for the SP5 will indicate that...
| info!( | ||
| self.log, | ||
| "ignoring unexpected data in LastPostCode TLV entry" | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turbo nit, take it or leave it: should we perhaps log the leftover bytes here? i suppose that could be a lot...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷🏻♂️ I was matching previous code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be kinda nice (though certainly not a blocker) to get the faux-mgs text output format to print the POST code in hex...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added a helper struct that does a custom impl Display
|
|
||
| /// Most recent POST code seen by the sequencer FPGA | ||
| #[derive(Copy, Clone, Debug, Serialize, Deserialize, SerializedSize)] | ||
| pub struct LastPostCode(pub u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, another thing that occurred to me just now: i wonder if this message ought to include some kind of timestamp? we could either just include the total Hubris uptime in milliseconds (which would be very easy to add) or perhaps a time since the system was commanded to A0 (which could be more annoying)...or even a relative timestamp since the POST code last changed? this could be a bunch of extra work, but perhaps we might want to be able to look at how long we've been in a particular boot state?
it's fine to punt on that in this PR --- I know we have also discussed the idea of a ringbuf of the last n POST codes, so if we go back and implement that, we could just expose timestamps in that interface...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should punt on this – the SP doesn't record the POST codes itself, so I'd lean towards doing timestamps at the same time as we expose a ringbuf (which requires FPGA changes).
Yeah – we have to ask for a specific component's details, so the caller knows it's for the |
10473b1 to
17381b2
Compare
Fixes #2244 We would like to be able to read the `LAST_POST_CODE` register from the FPGA. This PR adds it as a `ComponentDetail` to the `SP5_HOST_CPU` component. The MGS side is oxidecomputer/management-gateway-service#451, so this is marked as a draft (but can still be reviewed). There are two interesting changes: - `Inventory::num_component_details` now returns 1 for the `SP5_HOST_CPU` component - `Inventory::component_details` now accepts a closure to handle any components in `OUR_DEVICES` The rest of the PR is mostly plumbing, e.g. adding modules to the generated FMC code (because otherwise we have name collisions between `espi` and `sequencer` registers). I tested this on the bench with a Grapefruit, and it "works": ``` ➜ management-gateway-service jj:(zq) cargo run -pfaux-mgs -- --interface=en10 component-details sp5-host-cpu Compiling gateway-messages v0.1.0 (/Users/mjk/oxide/management-gateway-service/gateway-messages) Compiling gateway-sp-comms v0.1.1 (/Users/mjk/oxide/management-gateway-service/gateway-sp-comms) Compiling faux-mgs v0.1.1 (/Users/mjk/oxide/management-gateway-service/faux-mgs) Finished `dev` profile [unoptimized + debuginfo] target(s) in 7.01s Running `target/debug/faux-mgs --interface=en10 component-details sp5-host-cpu` Sep 25 15:41:12.950 INFO creating SP handle on interface en10, component: faux-mgs Sep 25 15:41:12.954 INFO initial discovery complete, addr: [fe80::c1d:8cff:fec0:e208%26]:11111, interface: en10, socket: control-plane-agent, component: faux-mgs LastPostCode(LastPostCode(0)) ``` I've also tested the `Sequencer.last_post_code` function on a Cosmo through `humility hiffy`. I would like to do an end-to-end test, but Cosmos with network connectivity are in high demand (see oxidecomputer/meta#761)
In particular, this picks up oxidecomputer/management-gateway-service#451 and oxidecomputer/management-gateway-service#452, which we'd like for debugging hung sled issues. To make this easier to backport to the R17 release branch, it makes minimal changes: requesting these new types of component details from MGS proper will return a 400 (albeit one that probably contains all the details we actually want in the error message!), because returning a 200 would require revving the MGS OpenAPI. The control plane has no logic to collect this information yet, so I think this is fine - it'll be available to support via `faux-mgs` as we'd like, and I'll file an issue for adding proper MGS support at our convenience.
In particular, this picks up oxidecomputer/management-gateway-service#451 and oxidecomputer/management-gateway-service#452, which we'd like for debugging hung sled issues. To make this easier to backport to the R17 release branch, it makes minimal changes: requesting these new types of component details from MGS proper will return a 400 (albeit one that probably contains all the details we actually want in the error message!), because returning a 200 would require revving the MGS OpenAPI. The control plane has no logic to collect this information yet, so I think this is fine - it'll be available to support via `faux-mgs` as we'd like, and I'll file an issue for adding proper MGS support at our convenience. --------- Co-authored-by: iliana etaoin <iliana@oxide.computer>
Building block for oxidecomputer/hubris#2244
This does not require a version bump, because the TLV-encoded values are serialized as a blob and deserialized on a best-effort basis by looking at tags.