[nexus] migrate ereports to a single table #9373

hawkw · 2025-11-07T21:14:52Z

Currently, ereports ingested from service processors and from the host OS' fmd
are stored in separate CRDB tables (sp_ereport and host_ereport,
respectively). The data in these tables is mostly the same, but the SP
ereports contain fields for the SP type and slot used to request its ereports
from MGS, while the host OS ereports contain the UUID of the sled used to find
its sled-agent address.

Splitting this data across two tables allows us to represent these differing
metadata fields nicely, but it makes it difficult to query or paginate over
ereports regardless of their source. Diagnosis engines will need to be able to
do this. Thus, it's desirable to change this so that a single table can be
queried to retrieve any ereport. This will also make life easier if we add new
reporter kinds in the future, such as ereports from Omicron zones or synthesized
by the control plane's fault manager in response to health endpoint checks.

There are a variety of ways we could achieve this. We could normalize the
shared data into one ereport table and put only the host-specific metadata
fields in two separate tables, JOINing on the ereport's restart ID and ENA.
Or, we could use a SQL view to have a view that abstracts over querying the two
separate tables. Some of these options were discussed in
https://github.com/oxidecomputer/rfd/pull/942#discussion_r2430786698. However,
this branch takes the simplest option, and just puts everything in one big
table. While the metadata fields are now all nullable, we use a CHECK
constraint to ensure that SP ereports always have non-NULL sp_type and
sp_slot, while host OS ereports always have non-NULL sled_ids. This does
mean that the Rust database model code has to handle the possibility of invalid
data, but we can just expect or otherwise assert it will be shaped correctly
according to the CHECK constraint. On the other hand, just having one big
table gives us much more predictable and understandable query plans than a view.

I've also reworked the way that ereports are constructed, by adding some new
domain types in nexus_types::fm::ereport. Previously, the database queries to
insert new ereports took a single reporter identity that was used to determine
the highest ENA inserted for that reporter, but also took a list of
db::model::Ereports to insert, which also repeated the reporter identity.
Therefore, it was possible to insert a batch of ereports from a different
reporter than the one whose highest ENA is queried as part of the insert
operation, which I never loved. Now, we have a separate EreportData type that
doesn't include this reporter identity, and the insert queries construct the
db::model::Ereport from this data and the single provided reporter identity,
which I hate a lot less.

…-table

hawkw added 4 commits November 6, 2025 16:33

[wip] start refactoring ereport tables

7cdf83d

finish doing that

2b13209

a bunch of struggling to destroy full table scans

1a603b4

add migrations

506a948

hawkw requested review from davepacheco and smklein November 7, 2025 21:14

hawkw added 4 commits November 7, 2025 13:16

Merge remote-tracking branch 'origin/main' into eliza/one-big-ereport…

9159f3b

…-table

nicer support bundle collector type conversions

e5c0020

blarg

de631a6

rustfmt

1db3dbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nexus] migrate ereports to a single table #9373

[nexus] migrate ereports to a single table #9373

Uh oh!

hawkw commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[nexus] migrate ereports to a single table #9373

Are you sure you want to change the base?

[nexus] migrate ereports to a single table #9373

Uh oh!

Conversation

hawkw commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants