Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown and compaction race #449

Closed
martinsumner opened this issue Sep 3, 2024 · 0 comments
Closed

Shutdown and compaction race #449

martinsumner opened this issue Sep 3, 2024 · 0 comments

Comments

@martinsumner
Copy link
Owner

If while a clerk is completing a compaction (i.e. post scoring) an inker is shutdown. The following may occur.

  1. The inker starts the close process, it waits for the iclerk to respond to leveled_iclerk:clerk_stop(State#state.clerk)
  2. The iclerk responds when it has cast to the inker leveled_inker:ink_clerkcomplete(State#state.inker, ManifestSlice, FilesToDelete).
  3. The inker receives the clerk_complete message (prior to the maybe_defer_shutdown, updates the manifest and casts a request to leveled_iclerk:clerk_promptdeletions(State#state.clerk, NewManifestSQN, FilesToDelete) ... but this cast will never be received as the clerk is shutdown.
  4. The inker now receives maybe_defer_shutdown and then complete_shutdown and shuts down all files within the manifest ... but this doesn't include the files which were to be prompted for deletion.
  5. As the replaced CDB files are now neither shutdown due to being in the manifest, nor shutdown by delete_pending state timeout - so they are hanging PIDs.
  6. when node termination eventually closes the PIDs, they will not clear the files from disk, as they have not been moved to delete_pending.

This is (rarely) detected by leveled_statemeqc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant