Skip to content

Conversation

@georgeee
Copy link
Member

@georgeee georgeee commented Nov 14, 2025

Problem

Frontier persisting is fundamentally broken.

Persistent frontier is managed by an on-disk database coupled with a file containing the latest root hash. While the root file is updated immediately after the in-memory frontier switched from one root to another, storing of the transitions on disk (in the transitions database) comes with up to 9 blocks delay.

This setup allows for better efficiency, but when the Mina node starts, it detects that the root as present in the file is not equal to the root from the database, and treats the whole on-disk persistence as corrupt, removing frontier entirely. This discrepancy between root file and transition database can occur easily, e.g. on the Mina node exiting abruptly (unsure whether it's the case when node is finished with a stop command sent via client).

Solution

Tool that is introduced in this PR allows to fix the state of the persistent frontier's database by removing some transitions fromt he database until the root read from the file becomes a root of the database as well. Removing is performed by executing part of the regular root-moving routine.

In future we must consider embedding this recovery mechanism into Mina node itself, but as the first step it's good to be able to deliver it wrapped in a CLI tool.

Commit structure

PR contains a few straightforward commits exposing some functions across the codebase:

  • Expose stable's header in validated_block.ml
  • Add copy_dir function to stdlib
  • Expose get_root_hash from persistent frontier
  • Expose stable's transition fun from root data
  • Expose protocol_states_for_root_scan_state in transition frontier

With the final commit introducing the new tool:

  • Add mina advanced fix-persistent-frontier

Testing

Explain how you tested your changes:

  • Test the tool on a frontier with 1-block discrepancy
  • Test the tool on a frontier with more than 1-block discrepancy
    • Confirm node is able to start after loading such frontier

Checklist

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? None

@georgeee georgeee requested a review from a team as a code owner November 14, 2025 20:28
@georgeee georgeee requested a review from glyh November 14, 2025 20:30
@georgeee georgeee added the oom label Nov 16, 2025
@georgeee georgeee changed the base branch from georgeee/fix-frontier-root-base to compatible November 17, 2025 13:47
@georgeee georgeee force-pushed the georgeee/fix-frontier-root branch from b548588 to aa625e2 Compare November 17, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants