Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix truncateLsn update #101

Merged
merged 3 commits into from
Nov 9, 2021
Merged

Fix truncateLsn update #101

merged 3 commits into from
Nov 9, 2021

Conversation

petuhovskiy
Copy link
Member

There was a bug when candidateTruncateLsn was distant from truncateLsn and a lot of CPU time was spent in HandleWalKeeperResponse trying to advance truncateLsn. This PR fixes this by changing logic to update truncateLsn, using Min(walkeeper[i].feedback.flushLsn) as a candidate for new truncateLsn.

In case of recovery, minimal safekeeper's flushLsn can be greater than truncateLsn, and streaming can skip some messages, resulting in assertion error.
@petuhovskiy petuhovskiy requested a review from arssher November 9, 2021 12:35
XLogRecPtr lsn = UnknownXLogRecPtr;
for (int i = 0; i < n_walkeepers; i++)
{
if (walkeeper[i].feedback.flushLsn < lsn)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as term_history patch is not merged, we also must check here epoch like GetAcknowledgedByQuorumWALPosition does, otherwise flushLsn is essentially lie, it points to wrong history. And if you check the epoch, I believe acknowledgedLsn becomes unnecessary.

(but check msgQueueHead->ackMask == ((1 << n_walkeepers) - 1) in queue cleanup is still needed due to 0-sized messages, though it would be nice to simplify this as well)

* ack only on record boundaries.
*/
minFlushLsn = CalculateMinFlushLsn();
if (minFlushLsn > truncateLsn && minFlushLsn <= minQuorumLsn && minFlushLsn <= acknowledgedLsn)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How minFlushLsn <= minQuorumLsn can be false? I think this should be removed. I also considered CalculateMinFlushLsn returning UnknownXLogRecPtr, but obviously it can't do that.

@petuhovskiy petuhovskiy requested a review from arssher November 9, 2021 16:54
* ack only on record boundaries.
*/
minFlushLsn = CalculateMinFlushLsn();
if (minFlushLsn > truncateLsn)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't go backwards, so I think we can directly assign.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails in test_restart_compute[True] for me, looks like minFlushLsn = 0/0 if safekeepers are empty.

@petuhovskiy petuhovskiy merged commit ecc6266 into main Nov 9, 2021
@petuhovskiy petuhovskiy deleted the fix/truncate-lsn branch November 9, 2021 20:46
ololobus pushed a commit that referenced this pull request Nov 10, 2021
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
lubennikovaav pushed a commit that referenced this pull request Feb 9, 2022
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
MMeent pushed a commit that referenced this pull request Jul 7, 2022
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
MMeent pushed a commit that referenced this pull request Aug 18, 2022
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
lubennikovaav pushed a commit that referenced this pull request Nov 21, 2022
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
MMeent pushed a commit that referenced this pull request Feb 10, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
MMeent pushed a commit that referenced this pull request Feb 10, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
MMeent pushed a commit that referenced this pull request May 11, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Aug 10, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request Feb 6, 2024
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
tristan957 pushed a commit that referenced this pull request May 10, 2024
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants