You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on buffering documentation, a simulation was ran to see how the buffering behavior would act if a PlannedReparentShard (PRS) command was to fail. During this scenario it was discovered the failure of the PRS command takes about 40 - 50 seconds to process.
$ time vtctlclient -server localhost:15999 PlannedReparentShard -keyspace_shard=commerce/0
PlannedReparentShard Error: rpc error: code = Unknown desc = primary-elect tablet zone1-0000000101 failed to catch up with replication MySQL56/4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: rpc error: code = Unknown desc = TabletManager.WaitForPosition on zone1-0000000101 error: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186
E1221 20:04:41.042580 205304 main.go:76] remote error: rpc error: code = Unknown desc = primary-elect tablet zone1-0000000101 failed to catch up with replication MySQL56/4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: rpc error: code = Unknown desc = TabletManager.WaitForPosition on zone1-0000000101 error: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186
real 0m41.734s
user 0m0.008s
sys 0m0.015s
Looking over the buffering results in this situation there were a few buffers that expired due to exceeding the -buffer_window default is 20 seconds, this is shown below:
In follow up buffering test, it was found these values helped span the PRS failure and prevented sending errors to applications sending query request during this period. This was tested on main and I would purpose these changes to compliment the changes made in the Change default Vitess buffering implementation issue#9359
The text was updated successfully, but these errors were encountered:
Buffering Timeout Problems
While working on buffering documentation, a simulation was ran to see how the buffering behavior would act if a PlannedReparentShard (PRS) command was to fail. During this scenario it was discovered the failure of the PRS command takes about 40 - 50 seconds to process.
Looking over the buffering results in this situation there were a few buffers that expired due to exceeding the
-buffer_window
default is 20 seconds, this is shown below:NOTE: All 15 of the connections utilized in this scenario failed due to
WindowExceeded
Purposed Solution
To better handle the failed PRS scenario, I would like to purpose changing the default buffer times to the following values:
In follow up buffering test, it was found these values helped span the PRS failure and prevented sending errors to applications sending query request during this period. This was tested on main and I would purpose these changes to compliment the changes made in the
Change default Vitess buffering implementation issue
#9359The text was updated successfully, but these errors were encountered: