Skip to content

Conversation

@mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Sep 25, 2025

Building block for oxidecomputer/hubris#2244

This does not require a version bump, because the TLV-encoded values are serialized as a blob and deserialized on a best-effort basis by looking at tags.

/// possible types contained in a component details message. Each TLV-encoded
/// struct corresponds to one of these cases.
///
/// As such, it is not part of the explicit message versioning scheme
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this; I was planning to go through things to confirm but see you already did :)

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! i wonder somewhat if there is anything we ought to do to explicitly indicate that this is a POST code from a SP5 CPU, but i suppose the rest of the inventory data for the SP5 will indicate that...

Comment on lines +1635 to +1638
info!(
self.log,
"ignoring unexpected data in LastPostCode TLV entry"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit, take it or leave it: should we perhaps log the leftover bytes here? i suppose that could be a lot...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏻‍♂️ I was matching previous code here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be kinda nice (though certainly not a blocker) to get the faux-mgs text output format to print the POST code in hex...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a helper struct that does a custom impl Display


/// Most recent POST code seen by the sequencer FPGA
#[derive(Copy, Clone, Debug, Serialize, Deserialize, SerializedSize)]
pub struct LastPostCode(pub u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, another thing that occurred to me just now: i wonder if this message ought to include some kind of timestamp? we could either just include the total Hubris uptime in milliseconds (which would be very easy to add) or perhaps a time since the system was commanded to A0 (which could be more annoying)...or even a relative timestamp since the POST code last changed? this could be a bunch of extra work, but perhaps we might want to be able to look at how long we've been in a particular boot state?

it's fine to punt on that in this PR --- I know we have also discussed the idea of a ringbuf of the last n POST codes, so if we go back and implement that, we could just expose timestamps in that interface...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should punt on this – the SP doesn't record the POST codes itself, so I'd lean towards doing timestamps at the same time as we expose a ringbuf (which requires FPGA changes).

@mkeeter
Copy link
Contributor Author

mkeeter commented Sep 29, 2025

i wonder somewhat if there is anything we ought to do to explicitly indicate that this is a POST code from a SP5 CPU, but i suppose the rest of the inventory data for the SP5 will indicate that...

Yeah – we have to ask for a specific component's details, so the caller knows it's for the sp5-host-cpu component.

@mkeeter mkeeter force-pushed the mkeeter/last-post-code branch from 10473b1 to 17381b2 Compare September 29, 2025 19:51
@mkeeter mkeeter merged commit 64f5741 into main Sep 30, 2025
12 checks passed
@mkeeter mkeeter deleted the mkeeter/last-post-code branch September 30, 2025 14:02
mkeeter added a commit to oxidecomputer/hubris that referenced this pull request Oct 1, 2025
Fixes #2244 

We would like to be able to read the `LAST_POST_CODE` register from the
FPGA. This PR adds it as a `ComponentDetail` to the `SP5_HOST_CPU`
component.

The MGS side is
oxidecomputer/management-gateway-service#451, so
this is marked as a draft (but can still be reviewed).

There are two interesting changes:
- `Inventory::num_component_details` now returns 1 for the
`SP5_HOST_CPU` component
- `Inventory::component_details` now accepts a closure to handle any
components in `OUR_DEVICES`

The rest of the PR is mostly plumbing, e.g. adding modules to the
generated FMC code (because otherwise we have name collisions between
`espi` and `sequencer` registers).

I tested this on the bench with a Grapefruit, and it "works":
```
➜  management-gateway-service jj:(zq) cargo run -pfaux-mgs -- --interface=en10 component-details sp5-host-cpu
   Compiling gateway-messages v0.1.0 (/Users/mjk/oxide/management-gateway-service/gateway-messages)
   Compiling gateway-sp-comms v0.1.1 (/Users/mjk/oxide/management-gateway-service/gateway-sp-comms)
   Compiling faux-mgs v0.1.1 (/Users/mjk/oxide/management-gateway-service/faux-mgs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 7.01s
     Running `target/debug/faux-mgs --interface=en10 component-details sp5-host-cpu`
Sep 25 15:41:12.950 INFO creating SP handle on interface en10, component: faux-mgs
Sep 25 15:41:12.954 INFO initial discovery complete, addr: [fe80::c1d:8cff:fec0:e208%26]:11111, interface: en10, socket: control-plane-agent, component: faux-mgs
LastPostCode(LastPostCode(0))
```

I've also tested the `Sequencer.last_post_code` function on a Cosmo
through `humility hiffy`. I would like to do an end-to-end test, but
Cosmos with network connectivity are in high demand (see
oxidecomputer/meta#761)
jgallagher added a commit to oxidecomputer/omicron that referenced this pull request Nov 6, 2025
In particular, this picks up
oxidecomputer/management-gateway-service#451 and
oxidecomputer/management-gateway-service#452,
which we'd like for debugging hung sled issues.

To make this easier to backport to the R17 release branch, it makes
minimal changes: requesting these new types of component details from
MGS proper will return a 400 (albeit one that probably contains all the
details we actually want in the error message!), because returning a 200
would require revving the MGS OpenAPI. The control plane has no logic to
collect this information yet, so I think this is fine - it'll be
available to support via `faux-mgs` as we'd like, and I'll file an issue
for adding proper MGS support at our convenience.
jgallagher added a commit to oxidecomputer/omicron that referenced this pull request Nov 6, 2025
In particular, this picks up
oxidecomputer/management-gateway-service#451 and
oxidecomputer/management-gateway-service#452,
which we'd like for debugging hung sled issues.

To make this easier to backport to the R17 release branch, it makes
minimal changes: requesting these new types of component details from
MGS proper will return a 400 (albeit one that probably contains all the
details we actually want in the error message!), because returning a 200
would require revving the MGS OpenAPI. The control plane has no logic to
collect this information yet, so I think this is fine - it'll be
available to support via `faux-mgs` as we'd like, and I'll file an issue
for adding proper MGS support at our convenience.

---------

Co-authored-by: iliana etaoin <iliana@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants