[nexus] migrate ereports to a single table #9373
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, ereports ingested from service processors and from the host OS'
fmdare stored in separate CRDB tables (
sp_ereportandhost_ereport,respectively). The data in these tables is mostly the same, but the SP
ereports contain fields for the SP type and slot used to request its ereports
from MGS, while the host OS ereports contain the UUID of the sled used to find
its sled-agent address.
Splitting this data across two tables allows us to represent these differing
metadata fields nicely, but it makes it difficult to query or paginate over
ereports regardless of their source. Diagnosis engines will need to be able to
do this. Thus, it's desirable to change this so that a single table can be
queried to retrieve any ereport. This will also make life easier if we add new
reporter kinds in the future, such as ereports from Omicron zones or synthesized
by the control plane's fault manager in response to health endpoint checks.
There are a variety of ways we could achieve this. We could normalize the
shared data into one
ereporttable and put only the host-specific metadatafields in two separate tables,
JOINing on the ereport's restart ID and ENA.Or, we could use a SQL view to have a view that abstracts over querying the two
separate tables. Some of these options were discussed in
https://github.com/oxidecomputer/rfd/pull/942#discussion_r2430786698. However,
this branch takes the simplest option, and just puts everything in one big
table. While the metadata fields are now all nullable, we use a
CHECKconstraint to ensure that SP ereports always have non-
NULLsp_typeandsp_slot, while host OS ereports always have non-NULLsled_ids. This doesmean that the Rust database model code has to handle the possibility of invalid
data, but we can just
expector otherwise assert it will be shaped correctlyaccording to the
CHECKconstraint. On the other hand, just having one bigtable gives us much more predictable and understandable query plans than a view.
I've also reworked the way that ereports are constructed, by adding some new
domain types in
nexus_types::fm::ereport. Previously, the database queries toinsert new ereports took a single reporter identity that was used to determine
the highest ENA inserted for that reporter, but also took a list of
db::model::Ereports to insert, which also repeated the reporter identity.Therefore, it was possible to insert a batch of ereports from a different
reporter than the one whose highest ENA is queried as part of the insert
operation, which I never loved. Now, we have a separate
EreportDatatype thatdoesn't include this reporter identity, and the insert queries construct the
db::model::Ereportfrom this data and the single provided reporter identity,which I hate a lot less.