You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We ran into an issue where promoting the replica to primary via PlannedReparent succeeded. However, the new primary had actually not caught up to the position of the old primary. There were several thousand missing transactions.
Reproduction Steps
This is non-trivial to reproduce, it needs a decent amount of load, or some other condition to make the replica lag.
One way to make the replica lag is to use it take a backup. As soon as the backup is complete, while the replica is lagged, use PRS to promote it.
Necessary pre-condition: Lag should be high enough that replica cannot catchup during the time allowed (wait-replicas-timeout). Making wait-replicas-timeout small (like 1 second) will probably help to reproduce.
Binary Version
main for now, will check other versions and update.
main for now, will check other versions and update.
The bug is present on all release branches, but not in any released version. We will be fixing this on all branches.
However, in addition to fixing how we handle the return values from each flavor, we should also add a check in PRS after WaitForPosition to make sure that the replica did in fact reach the desired position.
Overview of the Issue
We ran into an issue where promoting the replica to primary via PlannedReparent succeeded. However, the new primary had actually not caught up to the position of the old primary. There were several thousand missing transactions.
Reproduction Steps
This is non-trivial to reproduce, it needs a decent amount of load, or some other condition to make the replica lag.
One way to make the replica lag is to use it take a backup. As soon as the backup is complete, while the replica is lagged, use PRS to promote it.
Necessary pre-condition: Lag should be high enough that replica cannot catchup during the time allowed (wait-replicas-timeout). Making
wait-replicas-timeout
small (like 1 second) will probably help to reproduce.Binary Version
main for now, will check other versions and update.
Operating System and Environment details
Log Fragments
Note the time difference - almost 30 seconds, which is the amount of time allowed by default.
The text was updated successfully, but these errors were encountered: