-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix truncateLsn update #101
Conversation
In case of recovery, minimal safekeeper's flushLsn can be greater than truncateLsn, and streaming can skip some messages, resulting in assertion error.
XLogRecPtr lsn = UnknownXLogRecPtr; | ||
for (int i = 0; i < n_walkeepers; i++) | ||
{ | ||
if (walkeeper[i].feedback.flushLsn < lsn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as term_history patch is not merged, we also must check here epoch like GetAcknowledgedByQuorumWALPosition does, otherwise flushLsn is essentially lie, it points to wrong history. And if you check the epoch, I believe acknowledgedLsn becomes unnecessary.
(but check msgQueueHead->ackMask == ((1 << n_walkeepers) - 1)
in queue cleanup is still needed due to 0-sized messages, though it would be nice to simplify this as well)
* ack only on record boundaries. | ||
*/ | ||
minFlushLsn = CalculateMinFlushLsn(); | ||
if (minFlushLsn > truncateLsn && minFlushLsn <= minQuorumLsn && minFlushLsn <= acknowledgedLsn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How minFlushLsn <= minQuorumLsn
can be false? I think this should be removed. I also considered CalculateMinFlushLsn
returning UnknownXLogRecPtr
, but obviously it can't do that.
* ack only on record boundaries. | ||
*/ | ||
minFlushLsn = CalculateMinFlushLsn(); | ||
if (minFlushLsn > truncateLsn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't go backwards, so I think we can directly assign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fails in test_restart_compute[True]
for me, looks like minFlushLsn = 0/0
if safekeepers are empty.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
There was a bug when candidateTruncateLsn was distant from truncateLsn and a lot of CPU time was spent in HandleWalKeeperResponse trying to advance truncateLsn. This PR fixes this by changing logic to update truncateLsn, using
Min(walkeeper[i].feedback.flushLsn)
as a candidate for new truncateLsn.