Skip to content

Need prototype reporting of persistent sled faults #1366

@bnaecker

Description

@bnaecker

During investigation of #1364, Josh brought up the general point of fault reporting. See this comment thread for context. This issue tracks adding some prototype or preliminary reporting of persistent faults on a sled. In that particular issue, a failure to delete an OPTE port means that the sled cannot be used further, at least for hosting that particular guest instance. We'd like a simple way to track that fact, ideally in CockroachDB, and use that knowledge in Nexus to direct instances (or Oxide services, potentially) to other sleds.

cc @jclulow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions