-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rethink about log compaction #7162
Comments
It seems to be interesting and important. logcabin configures the trigger based on both of size and a number of not snapshotted entries (although taking a snapshot is triggered when all of the conditions are satisfied): https://github.com/logcabin/logcabin/blob/master/Server/StateMachine.cc#L593 I also think that even a number of entries are small, replaying them on revived followers can be longer if they contain much of puts. Because of parallelism (nondeterminism) unfriendly nature of state machine replication, even replaying cannot exploit multicore. Maybe
would be helpful for stable operation of etcd cluster and increase its availability? |
@mitake I assigned this to both you and me. I assume you are interested in this one :) |
@xiang90 sure, of course. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Now we compact raft log every 100,000 entries.
So we will keep at most 100,000 entries in-memory.
Keeping more entries in memory is good for fast follower recovery. If a follower dies, and it restarts within 100,000 entries lagging to the leader, the leader can send entries to followers without triggering a snapshot sent. Sending a snapshot is usually more expensive than sending entries.
However, 100,000 fixed number can be dangerous, and causes OOM. We assume each entry is around 1KB. So 100,000 entries is only 100MB. However, the max entry size is 1MB. In this cause, 100,000 entries cost 100GB.
I propose that we also need to take entries size into consideration when decide to compaction.
The text was updated successfully, but these errors were encountered: