-
Notifications
You must be signed in to change notification settings - Fork 743
Fix follower assertions on attach snapshot races #15049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix follower assertions on attach snapshot races #15049
Conversation
|
🟢 |
|
⚪ DetailsTest history | Ya make output | Test bloat
⚪ Test history | Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ DetailsTest history | Ya make output | Test bloat
⚪ DetailsTest history | Ya make output | Test bloat | Test bloat
⚪ Test history | Ya make output | Test bloat | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Changelog entry
Fixed a rare assertion (process crash) when followers attached to leaders with an inconsistent snapshot. Fixes #15042.
Changelog category
Description for reviewers
Followers produced crashes in production periodically complaining about log reordering, with an error message indicating as if they tried to apply a duplicate redo log entry (which shouldn't have been possible). Turns out snapshots created within read-only transactions that used
QueueScan(e.g. ReadTable and ScanQuery) persisted an incorrectSerialfield (a monotonically increasing change number) that was equal to the next transaction. When follower attached at just the right time, it could bootstrap from such a snapshot, and discover the next commit has the sameSerial, indicating a duplicate or reordered change.Thankfully this didn't affect leaders, since they apply pre-snapshot and post-snapshot redo log entries together, and only use snapshot serial as a hint of previously compacted changes. So even though snapshot technically had an inconsistent value it was self-healing and couldn't produce any externally visible inconsistencies.