Change the default buffering timers #9433

FancyFane · 2021-12-22T15:25:15Z

Buffering Timeout Problems

While working on buffering documentation, a simulation was ran to see how the buffering behavior would act if a PlannedReparentShard (PRS) command was to fail. During this scenario it was discovered the failure of the PRS command takes about 40 - 50 seconds to process.

$ time vtctlclient -server localhost:15999 PlannedReparentShard -keyspace_shard=commerce/0

PlannedReparentShard Error: rpc error: code = Unknown desc = primary-elect tablet zone1-0000000101 failed to catch up with replication MySQL56/4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: rpc error: code = Unknown desc = TabletManager.WaitForPosition on zone1-0000000101 error: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186
E1221 20:04:41.042580  205304 main.go:76] remote error: rpc error: code = Unknown desc = primary-elect tablet zone1-0000000101 failed to catch up with replication MySQL56/4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: rpc error: code = Unknown desc = TabletManager.WaitForPosition on zone1-0000000101 error: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186: timed out waiting for position 4fb7c72c-62c8-11ec-8287-8cae4cdeeda4:1-16186

real	0m41.734s
user	0m0.008s
sys	0m0.015s

Looking over the buffering results in this situation there were a few buffers that expired due to exceeding the -buffer_window default is 20 seconds, this is shown below:

curl -s localhost:15001/metrics | grep -v '^#' | grep buffer_requests
vtgate_buffer_requests_buffered{keyspace="commerce",shard_name="0"} 30
vtgate_buffer_requests_buffered_dry_run{keyspace="commerce",shard_name="0"} 0
vtgate_buffer_requests_drained{keyspace="commerce",shard_name="0"} 15
vtgate_buffer_requests_evicted{keyspace="commerce",reason="BufferFull",shard_name="0"} 0
vtgate_buffer_requests_evicted{keyspace="commerce",reason="ContextDone",shard_name="0"} 0
vtgate_buffer_requests_evicted{keyspace="commerce",reason="WindowExceeded",shard_name="0"} 15
vtgate_buffer_requests_skipped{keyspace="commerce",reason="BufferFull",shard_name="0"} 0
vtgate_buffer_requests_skipped{keyspace="commerce",reason="Disabled",shard_name="0"} 0
vtgate_buffer_requests_skipped{keyspace="commerce",reason="LastFailoverTooRecent",shard_name="0"} 50
vtgate_buffer_requests_skipped{keyspace="commerce",reason="LastReparentTooRecent",shard_name="0"} 0
vtgate_buffer_requests_skipped{keyspace="commerce",reason="Shutdown",shard_name="0"} 0

NOTE: All 15 of the connections utilized in this scenario failed due to WindowExceeded

Purposed Solution

To better handle the failed PRS scenario, I would like to purpose changing the default buffer times to the following values:

-buffer_max_failover_duration=1m  (current default 20s)
-buffer_min_time_between_failovers=2m (current default 1m)
-buffer_window=1m (current default 10s)

In follow up buffering test, it was found these values helped span the PRS failure and prevented sending errors to applications sending query request during this period. This was tested on main and I would purpose these changes to compliment the changes made in the Change default Vitess buffering implementation issue #9359

The text was updated successfully, but these errors were encountered:

Signed-off-by: FancyFane <fane@planetscale.com>

FancyFane self-assigned this Dec 22, 2021

FancyFane added a commit to planetscale/vitess that referenced this issue Dec 22, 2021

Adjustments for vitessio#9433

b970cf2

Signed-off-by: FancyFane <fane@planetscale.com>

FancyFane mentioned this issue Dec 22, 2021

Buffering Adjustments for #9433 #9434

Closed

3 tasks

FancyFane mentioned this issue Jan 6, 2022

Buffer Adjustments for #9433 #9476

Closed

3 tasks

vmg pushed a commit to planetscale/vitess that referenced this issue Jan 11, 2022

Adjustments for vitessio#9433

ede8b41

Signed-off-by: FancyFane <fane@planetscale.com>

ajm188 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management labels Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default buffering timers #9433

Change the default buffering timers #9433

FancyFane commented Dec 22, 2021

Change the default buffering timers #9433

Change the default buffering timers #9433

Comments

FancyFane commented Dec 22, 2021

Buffering Timeout Problems

Purposed Solution