Skip to content

Conversation

@snaury
Copy link
Member

@snaury snaury commented Feb 26, 2025

Changelog entry

Fixed a rare assertion (process crash) when followers attached to leaders with an inconsistent snapshot. Fixes #15042.

Changelog category

  • Bugfix

Description for reviewers

Followers produced crashes in production periodically complaining about log reordering, with an error message indicating as if they tried to apply a duplicate redo log entry (which shouldn't have been possible). Turns out snapshots created within read-only transactions that used QueueScan (e.g. ReadTable and ScanQuery) persisted an incorrect Serial field (a monotonically increasing change number) that was equal to the next transaction. When follower attached at just the right time, it could bootstrap from such a snapshot, and discover the next commit has the same Serial, indicating a duplicate or reordered change.

Thankfully this didn't affect leaders, since they apply pre-snapshot and post-snapshot redo log entries together, and only use snapshot serial as a hint of previously compacted changes. So even though snapshot technically had an inconsistent value it was self-healing and couldn't produce any externally visible inconsistencies.

@snaury snaury self-assigned this Feb 26, 2025
@github-actions
Copy link

github-actions bot commented Feb 26, 2025

2025-02-26 14:23:41 UTC Pre-commit check linux-x86_64-release-asan for 006f160 has started.
2025-02-26 14:23:56 UTC Artifacts will be uploaded here
2025-02-26 14:26:56 UTC ya make is running...
🟡 2025-02-26 15:40:55 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Details

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12213 12135 0 28 11 39

2025-02-26 15:42:17 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-02-26 16:03:37 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Details

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
123 (only retried tests) 85 0 1 0 37

2025-02-26 16:03:47 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-02-26 16:23:42 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
69 (only retried tests) 34 0 0 0 35

🟢 2025-02-26 16:23:49 UTC Build successful.
🟢 2025-02-26 16:24:18 UTC ydbd size 3.6 GiB changed* by +56.7 KiB, which is < 100.0 KiB vs stable-25-1: OK

ydbd size dash stable-25-1: d408be7 merge: 006f160 diff diff %
ydbd size 3 899 117 872 Bytes 3 899 175 952 Bytes +56.7 KiB +0.001%
ydbd stripped size 1 364 752 048 Bytes 1 364 773 232 Bytes +20.7 KiB +0.002%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Feb 26, 2025

2025-02-26 14:24:26 UTC Pre-commit check linux-x86_64-relwithdebinfo for 006f160 has started.
2025-02-26 14:24:39 UTC Artifacts will be uploaded here
2025-02-26 14:27:32 UTC ya make is running...
🟡 2025-02-26 15:33:08 UTC Some tests failed, follow the links below. Going to retry failed tests...

Details

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
26410 23812 0 2 2458 138

2025-02-26 15:35:39 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-02-26 16:02:48 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
209 (only retried tests) 76 0 0 1 132

🟢 2025-02-26 16:03:00 UTC Build successful.
🟢 2025-02-26 16:03:19 UTC ydbd size 2.1 GiB changed* by +176 Bytes, which is < 100.0 KiB vs stable-25-1: OK

ydbd size dash stable-25-1: 79eee97 merge: 006f160 diff diff %
ydbd size 2 241 513 800 Bytes 2 241 513 976 Bytes +176 Bytes +0.000%
ydbd stripped size 474 785 208 Bytes 474 785 208 Bytes 0 Bytes 0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@snaury snaury marked this pull request as ready for review February 27, 2025 08:00
@snaury snaury requested a review from a team as a code owner February 27, 2025 08:00
@snaury snaury merged commit 74de33d into ydb-platform:stable-25-1 Feb 27, 2025
12 checks passed
@snaury snaury deleted the bugfix-KIKIMR-18605-follower-snapshot-25-1 branch February 27, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants