-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426
Comments
The fix for this will need to be back ported to v16 and v17 and we'll need to do patch releases. |
There is a 30 second gap here. What seems to have happened is that because we enable semi-sync before transitioning the tablet to primary, the creation of _vt schema gets blocked by semi-sync. We point replicas to the new primary only after the transition to primary so there is no tablet available to ACK the write. In the meantime, vtorc detects that the replicas are pointing to the wrong (old) primary, but can't do anything because of the shard lock being held by PRS. At the end of 30 seconds, the lock times out, vtorc fixes replication, and the DDL can proceed. |
Overview of the Issue
We upgraded a large database to v16 recently. During the rollout, errors were served to the app for ~30 seconds.
The root cause seems to be that the upgrade of _vt schema during PlannedReparent was blocked by semi-sync.
Reproduction Steps
Upgrade a 3+ tablet cluster with semi-sync enabled from v15 to v16.
Binary Version
Operating System and Environment details
Log Fragments
The text was updated successfully, but these errors were encountered: