Skip to content

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Nov 5, 2025

No description provided.

Base automatically changed from eliza/sitrep-gc to main November 10, 2025 21:51
@hawkw hawkw force-pushed the eliza/rectifier-pulls branch from 81868f1 to 90112ab Compare November 11, 2025 17:37
@hawkw hawkw changed the base branch from main to eliza/one-big-ereport-table November 11, 2025 17:39
@AlejandroME AlejandroME added the fault-management Everything related to the fault-management initiative (RFD480 and others) label Nov 11, 2025
@hawkw hawkw force-pushed the eliza/rectifier-pulls branch from cb1849a to fc3eb68 Compare November 14, 2025 19:47
Base automatically changed from eliza/one-big-ereport-table to main November 19, 2025 21:00
@hawkw hawkw force-pushed the eliza/rectifier-pulls branch from abfcbec to 6af4dd4 Compare November 19, 2025 21:47
@hawkw
Copy link
Member Author

hawkw commented Nov 26, 2025

WHOA CHECK THIS OUT! I have a test:

#[test]
fn test_remove_insert_pwr_good() {
let FmTest { logctx, mut reporters, system_builder, sitrep_rng } =
FmTest::new("test_remove_insert_pwr_good");
let mut reporter = reporters
.reporter(Reporter::Sp { sp_type: SpType::Power, slot: 0 });
let (example_system, _) = system_builder.nsleds(2).build();
let mut sitrep = SitrepBuilder::new_with_rng(
&logctx.log,
&example_system.collection,
None,
sitrep_rng,
);
// It's the beginning of time!
let t0 = DateTime::<Utc>::MIN_UTC;
let mut de = PowerShelfDiagnosis::new(&logctx.log);
de.analyze_ereport(
&mut sitrep,
&Arc::new(
reporter.parse_ereport(t0, ereport_test::PSU_REMOVE_JSON),
),
)
.expect("analyzing ereport 1 should succeed");
de.analyze_ereport(
&mut sitrep,
&Arc::new(reporter.parse_ereport(
t0 + Duration::from_secs(1),
ereport_test::PSU_INSERT_JSON,
)),
)
.expect("analyzing ereport 2 should succeed");
de.analyze_ereport(
&mut sitrep,
&Arc::new(reporter.parse_ereport(
t0 + Duration::from_secs(2),
ereport_test::PSU_PWR_GOOD_JSON,
)),
)
.expect("analyzing ereport 3 should succeed");
de.finish(&mut sitrep).expect("finish should return Ok");
let sitrep = sitrep.build(OmicronZoneUuid::nil());
// TODO(eliza) ACTUALLY MAKE SOME ASSERTIONS ABOUT THE SITREP
eprintln!("{sitrep:#?}");
let case0 = {
let mut cases = sitrep.cases.iter();
let case0 = cases.next().expect("sitrep should have a case");
assert_eq!(
cases.next(),
None,
"sitrep should have exactly one case"
);
case0
};
let mut insert_alert = None;
let mut remove_alert = None;
for alert in &case0.alerts_requested {
match alert.class {
AlertClass::PsuInserted if insert_alert.is_none() => {
insert_alert = Some(alert);
}
AlertClass::PsuInserted => {
panic!(
"expected only one PSU inserted alert, saw multiple:\n\
1: {insert_alert:#?}\n\n2: {alert:#?}"
);
}
AlertClass::PsuRemoved if remove_alert.is_none() => {
remove_alert = Some(alert);
}
AlertClass::PsuRemoved => {
panic!(
"expected only one PSU removed alert, saw multiple:\n\
1: {remove_alert:#?}\n\n2: {alert:#?}"
);
}
}
}
assert!(insert_alert.is_some(), "no PSU inserted alert was requested!");
assert!(remove_alert.is_some(), "no PSU removed alert was requested!");
assert!(
!case0.is_open(),
"case should have been closed since everything is okay"
);
logctx.cleanup_successful();
}

and it kinda vaguely works 👍

  stdout ───

    running 1 test
    test diagnosis::power_shelf::test::test_remove_insert_pwr_good ... ok

    test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

  stderr ───
    log file: /tmp/nexus_fm-e900aaf8d6694728-test_remove_insert_pwr_good.713316.0.log
    note: configured to log to "/tmp/nexus_fm-e900aaf8d6694728-test_remove_insert_pwr_good.713316.0.log"
    TODO ELIZA PwrGood { pmbus_status: PmbusStatus { word: Some(0), input: Some(0), iout: Some(0), vout: Some(0), temp: Some(0), cml: Some(0), mfr: Some(0) } }
    Sitrep {
        metadata: SitrepMetadata {
            id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
            parent_sitrep_id: None,
            inv_collection_id: a5539102-d9ad-470c-bae5-57d372d90f3c (collection),
            creator_id: 00000000-0000-0000-0000-000000000000 (omicron_zone),
            comment: "",
            time_created: 2025-11-26T19:42:19.150326079Z,
        },
        cases: {
            8d30dbdb-66d1-4836-989d-20bdad7add82 (case): Case {
                id: 8d30dbdb-66d1-4836-989d-20bdad7add82 (case),
                created_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                time_created: 2025-11-26T19:42:19.150049747Z,
                closed_sitrep_id: Some(
                    d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                ),
                time_closed: Some(
                    2025-11-26T19:42:19.150324547Z,
                ),
                de: PowerShelf,
                ereports: {
                    EreportId {
                        restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                        ena: Ena(0x2),
                    }: CaseEreport {
                        ereport: Ereport {
                            data: EreportData {
                                id: EreportId {
                                    restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                                    ena: Ena(0x2),
                                },
                                time_collected: -262143-01-01T00:00:00Z,
                                collector_id: bcd6ef38-fa67-4216-a2ef-fd7e9ddd8c69 (omicron_zone),
                                serial_number: None,
                                part_number: None,
                                class: Some(
                                    "hw.remove.psu",
                                ),
                                report: Object {
                                    "baseboard_part_number": String("913-0000003"),
                                    "baseboard_rev": Number(8),
                                    "baseboard_serial_number": String("BRM45220004"),
                                    "ereport_message_version": Number(0),
                                    "fruid": Object {
                                        "fw_rev": String("0701"),
                                        "mfr": String("Murata-PS"),
                                        "mpn": String("MWOCP68-3600-D-RM"),
                                        "serial": String("LL2216RB003Z"),
                                    },
                                    "hubris_archive_id": String("qSm4IUtvQe0"),
                                    "hubris_task_gen": Number(0),
                                    "hubris_task_name": String("sequencer"),
                                    "hubris_uptime_ms": Number(1197337481),
                                    "k": String("hw.remove.psu"),
                                    "rail": String("V54_PSU4"),
                                    "refdes": String("PSU4"),
                                    "slot": Number(4),
                                    "v": Number(0),
                                },
                            },
                            reporter: Sp {
                                sp_type: Power,
                                slot: 0,
                            },
                        },
                        assigned_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                        comment: "PSU 4 was removed",
                    },
                    EreportId {
                        restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                        ena: Ena(0x3),
                    }: CaseEreport {
                        ereport: Ereport {
                            data: EreportData {
                                id: EreportId {
                                    restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                                    ena: Ena(0x3),
                                },
                                time_collected: -262143-01-01T00:00:01Z,
                                collector_id: bcd6ef38-fa67-4216-a2ef-fd7e9ddd8c69 (omicron_zone),
                                serial_number: None,
                                part_number: None,
                                class: Some(
                                    "hw.insert.psu",
                                ),
                                report: Object {
                                    "baseboard_part_number": String("913-0000003"),
                                    "baseboard_rev": Number(8),
                                    "baseboard_serial_number": String("BRM45220004"),
                                    "ereport_message_version": Number(0),
                                    "fruid": Object {
                                        "fw_rev": String("0701"),
                                        "mfr": String("Murata-PS"),
                                        "mpn": String("MWOCP68-3600-D-RM"),
                                        "serial": String("LL2216RB003Z"),
                                    },
                                    "hubris_archive_id": String("qSm4IUtvQe0"),
                                    "hubris_task_gen": Number(0),
                                    "hubris_task_name": String("sequencer"),
                                    "hubris_uptime_ms": Number(1197337481),
                                    "k": String("hw.insert.psu"),
                                    "rail": String("V54_PSU4"),
                                    "refdes": String("PSU4"),
                                    "slot": Number(4),
                                    "v": Number(0),
                                },
                            },
                            reporter: Sp {
                                sp_type: Power,
                                slot: 0,
                            },
                        },
                        assigned_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                        comment: "PSU 4 was inserted",
                    },
                    EreportId {
                        restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                        ena: Ena(0x4),
                    }: CaseEreport {
                        ereport: Ereport {
                            data: EreportData {
                                id: EreportId {
                                    restart_id: 2013de5a-bc17-4199-a5d4-ae9e9ddb6443 (ereporter_restart),
                                    ena: Ena(0x4),
                                },
                                time_collected: -262143-01-01T00:00:02Z,
                                collector_id: bcd6ef38-fa67-4216-a2ef-fd7e9ddd8c69 (omicron_zone),
                                serial_number: None,
                                part_number: None,
                                class: Some(
                                    "hw.pwr.pwr_good.good",
                                ),
                                report: Object {
                                    "baseboard_part_number": String("913-0000003"),
                                    "baseboard_rev": Number(8),
                                    "baseboard_serial_number": String("BRM45220004"),
                                    "ereport_message_version": Number(0),
                                    "fruid": Object {
                                        "fw_rev": String("0701"),
                                        "mfr": String("Murata-PS"),
                                        "mpn": String("MWOCP68-3600-D-RM"),
                                        "serial": String("LL2216RB003Z"),
                                    },
                                    "hubris_archive_id": String("qSm4IUtvQe0"),
                                    "hubris_task_gen": Number(0),
                                    "hubris_task_name": String("sequencer"),
                                    "hubris_uptime_ms": Number(1197408580),
                                    "k": String("hw.pwr.pwr_good.good"),
                                    "pmbus_status": Object {
                                        "cml": Number(0),
                                        "input": Number(0),
                                        "iout": Number(0),
                                        "mfr": Number(0),
                                        "temp": Number(0),
                                        "vout": Number(0),
                                        "word": Number(0),
                                    },
                                    "rail": String("V54_PSU4"),
                                    "refdes": String("PSU4"),
                                    "slot": Number(4),
                                    "v": Number(0),
                                },
                            },
                            reporter: Sp {
                                sp_type: Power,
                                slot: 0,
                            },
                        },
                        assigned_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                        comment: "PSU 4 asserted PWR_GOOD",
                    },
                },
                alerts_requested: {
                    eaf8e0d9-e706-4211-9b24-e61ce187ab65 (alert): AlertRequest {
                        id: eaf8e0d9-e706-4211-9b24-e61ce187ab65 (alert),
                        class: PsuInserted,
                        payload: Object {
                            "psc_id": Null,
                            "psc_slot": Number(0),
                            "psu_id": Object {
                                "firmware_revision": String("0701"),
                                "manufacturer": String("Murata-PS"),
                                "part_number": String("MWOCP68-3600-D-RM"),
                                "serial_number": String("LL2216RB003Z"),
                            },
                            "psu_slot": Number(4),
                            "time": String("-262143-01-01T00:00:01Z"),
                            "version": String("v0"),
                        },
                        requested_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                    },
                    fa4407a8-608d-40f0-8ccc-65fdedc950e0 (alert): AlertRequest {
                        id: fa4407a8-608d-40f0-8ccc-65fdedc950e0 (alert),
                        class: PsuRemoved,
                        payload: Object {
                            "psc_id": Null,
                            "psc_slot": Number(0),
                            "psu_id": Object {
                                "firmware_revision": String("0701"),
                                "manufacturer": String("Murata-PS"),
                                "part_number": String("MWOCP68-3600-D-RM"),
                                "serial_number": String("LL2216RB003Z"),
                            },
                            "psu_slot": Number(4),
                            "time": String("-262143-01-01T00:00:00Z"),
                            "version": String("v0"),
                        },
                        requested_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                    },
                },
                impacted_locations: {
                    (
                        Power,
                        0,
                    ): ImpactedLocation {
                        sp_type: Power,
                        slot: 0,
                        created_sitrep_id: d82641e6-ba8e-4829-bb64-c9a53698faf7 (sitrep),
                        comment: "this is the power shelf where the PSU event occurred",
                    },
                },
                comment: "opened when power shelf 0 PSU 4 was removed",
            },
        },
    }

hawkw added 3 commits December 5, 2025 11:46
This commit moves the code that processes raw JSON representations of SP
ereports from the `ereport_ingester` background task to
`nexus_types::fm::ereport`. While a production system will only perform
this processing when an ereport is received from a service processor via
MGS in the `ereprot_ingester` task, I'd like to be able to also invoke
the same logic with hard-coded example JSON ereports for test purposes.
Currently, the test code for diagnosis engines in #9346 implemented its
own version of this which was slightly different from the
ereport_ingester version, so I've moved it to here so that we can use
the same logic when processing test inputs.
@hawkw hawkw force-pushed the eliza/rectifier-pulls branch from 47b491b to 270d432 Compare December 5, 2025 20:36
hawkw added a commit that referenced this pull request Dec 5, 2025
This commit moves the code that processes raw JSON representations of SP
ereports from the `ereport_ingester` background task to
`nexus_types::fm::ereport`. While a production system will only perform
this processing when an ereport is received from a service processor via
MGS in the `ereprot_ingester` task, I'd like to be able to also invoke
the same logic with hard-coded example JSON ereports for test purposes.
Currently, the test code for diagnosis engines in #9346 implemented its
own version of this which was slightly different from the
ereport_ingester version, so I've moved it to here so that we can use
the same logic when processing test inputs.
hawkw added a commit that referenced this pull request Dec 6, 2025
This commit moves the code that processes raw JSON representations of SP
ereports from the `ereport_ingester` background task to
`nexus_types::fm::ereport`. While a production system will only perform
this processing when an ereport is received from a service processor via
MGS in the `ereprot_ingester` task, I'd like to be able to also invoke
the same logic with hard-coded example JSON ereports for test purposes.
Currently, the test code for diagnosis engines in #9346 implemented its
own version of this which was slightly different from the
ereport_ingester version, so I've moved it to here so that we can use
the same logic when processing test inputs.

Note that this change was cherry-picked from #9346, where I had actually
written tests that needed to use this logic. I figured it was a small
and uncontroversial enough refactor (that makes no functional change
whatsoever) to pull it out to merge now, in an attempt to keep that
branch from getting even bigger and touching even more files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fault-management Everything related to the fault-management initiative (RFD480 and others)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants