Skip to content

Gracefully handle errors/recover from halting in chain indexer #461

@SupernaviX

Description

@SupernaviX

Right now, if a ChainIndex implementation returns an error from handle_rollback (or from handle_tx if we do not see a rollback to before the failure), we will halt that index until the server restarts. When it does restart, we will reset the index to its default start point and replay from there. This works, but will result in extremely long recovery times in production for potentially transient errors.

We want to recover from transient errors without this reset.

To do that, we need to

  • track the last K points processed by each index
  • track the last transaction processed by each index (we currently only track the block)
  • if an index was halted before, try rolling forward (or rolling back) from the last point it successfully processed at startup

At startup, the ChainIndexer will find the index with the earliest tip and request a FindIntersect from that index's tip (or the K points before that tip). It will then process new transactions or rollbacks as normal.

When an index sees a transaction from a point before its tip, it will check if that point is in the last K points it processed.

  • If the point is older than the index's oldest slot, it will ignore the point.
  • If the point is equal to the index's oldest slot and the hash does not match, it will halt (this means we are on a fork more than K blocks long).
  • If the point is newer than the index's oldest slot and the hash does not match, it will roll back to the point before the mismatch, and then process the new transaction.

NB: we are not adding this to Milestone 1, but treating it as a fast-follow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions