Skip to content

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Nov 7, 2025

Currently, ereports ingested from service processors and from the host OS' fmd
are stored in separate CRDB tables (sp_ereport and host_ereport,
respectively). The data in these tables is mostly the same, but the SP
ereports contain fields for the SP type and slot used to request its ereports
from MGS, while the host OS ereports contain the UUID of the sled used to find
its sled-agent address.

Splitting this data across two tables allows us to represent these differing
metadata fields nicely, but it makes it difficult to query or paginate over
ereports regardless of their source. Diagnosis engines will need to be able to
do this. Thus, it's desirable to change this so that a single table can be
queried to retrieve any ereport. This will also make life easier if we add new
reporter kinds in the future, such as ereports from Omicron zones or synthesized
by the control plane's fault manager in response to health endpoint checks.

There are a variety of ways we could achieve this. We could normalize the
shared data into one ereport table and put only the host-specific metadata
fields in two separate tables, JOINing on the ereport's restart ID and ENA.
Or, we could use a SQL view to have a view that abstracts over querying the two
separate tables. Some of these options were discussed in
https://github.com/oxidecomputer/rfd/pull/942#discussion_r2430786698. However,
this branch takes the simplest option, and just puts everything in one big
table. While the metadata fields are now all nullable, we use a CHECK
constraint to ensure that SP ereports always have non-NULL sp_type and
sp_slot, while host OS ereports always have non-NULL sled_ids. This does
mean that the Rust database model code has to handle the possibility of invalid
data, but we can just expect or otherwise assert it will be shaped correctly
according to the CHECK constraint. On the other hand, just having one big
table gives us much more predictable and understandable query plans than a view.

I've also reworked the way that ereports are constructed, by adding some new
domain types in nexus_types::fm::ereport. Previously, the database queries to
insert new ereports took a single reporter identity that was used to determine
the highest ENA inserted for that reporter, but also took a list of
db::model::Ereports to insert, which also repeated the reporter identity.
Therefore, it was possible to insert a batch of ereports from a different
reporter than the one whose highest ENA is queried as part of the insert
operation, which I never loved. Now, we have a separate EreportData type that
doesn't include this reporter identity, and the insert queries construct the
db::model::Ereport from this data and the single provided reporter identity,
which I hate a lot less.

@hawkw hawkw requested review from davepacheco and smklein November 7, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants