Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking snapshot and committing log are not atomic operations for each other #300

Open
yfei-z opened this issue Aug 20, 2024 · 1 comment

Comments

@yfei-z
Copy link
Contributor

yfei-z commented Aug 20, 2024

Snapshot requires the latest state of the state machine and the commit index, if taking the snapshot in a separated thread, although RAFT.snapshot() is synchronized but RAFT.commitLogTo(long, boolean) is not, if snapshot has the latest state but commit index is not updated, then wrong commit index will be saved in log as the first appended, and duplicated logs will be applied when state machine initial from the snapshot.
I think making RAFT.commitLogTo(long, boolean) synchronized could solve the problem and don't need to make the methods of state machine to be thread safe.

@jabolina
Copy link
Member

I've noticed the snapshot() is exposed in the RaftHandle public API. Are you invoking it from there? Otherwise, the snapshot method is only ever invoked from a single thread, the same as the commitLogTo method. They are in the handling loop which runs on a single thread.

Also, I am handwaving that snapshot() is also exposed as a JMX operation. The concurrency issue here would happen by invoking snapshot() directly through the RaftHandle API while the algorithm handles requests. We could submit a snapshot() operation to the internal queue to synchronize everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants