Skip to content

Add fault correlation/muting capability to FaultManager to handle "expected cascades" of faults. #105

@mfaferek93

Description

@mfaferek93

Summary

Add fault correlation/muting capability to FaultManager to handle "expected cascades" of faults.

Use cases (from this discussion: https://discordapp.com/channels/1451323858547118216/1459576596733104158):

  • e-Stop cascades: when e-stop triggers, downstream motor/comm faults are expected noise
  • OTA/restart transitions: temporary bursts of timeouts and comm errors during updates
  • degradation modes: intentionally disabled subsystems shouldn't keep surfacing known faults

Currently all faults are treated equally, making it hard to identify the actual root cause when multiple related faults fire simultaneously.


Proposed solution (optional)

Root Cause and Symptoms mapping

Instead of simple muting, define relationships between faults:

# 1. Define fault patterns (regex matching on diagnostic name/message)
fault_patterns:
  motor_low_power:
    name: "Motor.*"
    message: "Low Voltage|Power Loss"

  motor_comm_timeout:
    name: "Motor.*"
    message: "Timeout|No Response"
# 2. Define root causes with expected symptoms
root_causes:
  estop_pressed:
    name: "E-Stop Pressed"
    symptoms:
      - motor_low_power
      - motor_comm_timeout

Benefits over simple muting:

  • Faults aren't hidden, they're contextualized
  • Unexpected faults (not matching any symptom) stand out as potentially real issues

Implementation location: FaultManager (centralized, full system context)


Additional context (optional)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions