Skip to content

Commit

Permalink
Timeout Fixes and VTOrc Improvement (#11881) (#1435)
Browse files Browse the repository at this point in the history
* refactor: move tests out of newfeaturestest so that they run on upgrade-downgrade tests too

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add failing ers test for handling multiple vttablet failures with default values of flags

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add a new lock-timeout flag and use that instead of remote-operation-timeout

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: augment DownPrimary test to reproduce the issue of VTOrc not handling multiple failures

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove LockShardTimeout configuration from VTOrc and add parallelism to refresh of tablets

Signed-off-by: Manan Gupta <manan@planetscale.com>

* log: add more logging lines around ers in vtorc

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: get the test to work

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix usage of wait for replicas timeout

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix flags expected output

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix race in test now that the function is called in parallel multiple times

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix default of onCloseTimeout to 1 second

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: add failing unit test to refreshTabletsInKeyspaceShard

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix vtorc to not forget a tablet which has been deleted

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix backward compatibility, add tests and release notes docs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix flags output

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: use disable-replication-manager instead of disable-active-reparents to allow vttablets to setup replication when restarted

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix flaky test by not checking for an error

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: handle the case of empty hostname in tablet initialization

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: update onclose timeout to 10 seconds

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix unit test

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: address review comments

Signed-off-by: Manan Gupta <manan@planetscale.com>

* docs: add comments explaining the test functions

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add summary docs for 'lock-shard-timeout' deprecation

Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>
  • Loading branch information
GuptaManan100 authored Dec 19, 2022
1 parent b3b034a commit aa327fe
Show file tree
Hide file tree
Showing 28 changed files with 389 additions and 222 deletions.
2 changes: 1 addition & 1 deletion go/flags/endtoend/mysqlctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Global flags:
--mysqlctl_client_protocol string the protocol to use to talk to the mysqlctl server (default "grpc")
--mysqlctl_mycnf_template string template file to use for generating the my.cnf file during server init
--mysqlctl_socket string socket file to use for remote mysqlctl actions (empty for local actions)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--pool_hostname_resolve_interval duration if set force an update to all hostnames and reconnect if changed, defaults to 0 (disabled)
Expand Down
2 changes: 1 addition & 1 deletion go/flags/endtoend/mysqlctld.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Usage of mysqlctld:
--mysql_socket string Path to the mysqld socket file
--mysqlctl_mycnf_template string template file to use for generating the my.cnf file during server init
--mysqlctl_socket string socket file to use for remote mysqlctl actions (empty for local actions)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--pool_hostname_resolve_interval duration if set force an update to all hostnames and reconnect if changed, defaults to 0 (disabled)
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtbackup.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ Usage of vtbackup:
--keep-alive-timeout duration Wait until timeout elapses after a successful backup before shutting down.
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand Down Expand Up @@ -124,7 +125,7 @@ Usage of vtbackup:
--psdb.gcs_backup.creds_path string Credentials JSON for service account to use.
--psdb.gcs_backup.key_uri string GCP KMS Keyring URI to use.
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--restart_before_backup Perform a mysqld clean/full restart after applying binlogs, but before taking the backup. Only makes sense to work around xtrabackup bugs.
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
Expand Down
7 changes: 4 additions & 3 deletions go/flags/endtoend/vtctld.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Usage of vtctld:
--action_timeout duration time to wait for an action before resorting to force (default 2m0s)
--action_timeout duration time to wait for an action before resorting to force (default 1m0s)
--alsologtostderr log to standard error as well as files
--azblob_backup_account_key_file string Path to a file containing the Azure Storage account key; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_KEY will be used as the key itself (NOT a file path).
--azblob_backup_account_name string Azure Storage Account name for backups; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_NAME will be used.
Expand Down Expand Up @@ -61,12 +61,13 @@ Usage of vtctld:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--opentsdb_uri string URI of opentsdb /api/put method
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
Expand All @@ -80,7 +81,7 @@ Usage of vtctld:
--psdb.gcs_backup.creds_path string Credentials JSON for service account to use.
--psdb.gcs_backup.key_uri string GCP KMS Keyring URI to use.
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
--s3_backup_aws_retries int AWS request retries. (default -1)
Expand Down
5 changes: 3 additions & 2 deletions go/flags/endtoend/vtgate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ Usage of vtgate:
--keyspaces_to_watch strings Specifies which keyspaces this vtgate should have access to while routing queries or accessing the vschema.
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--legacy_replication_lag_algorithm Use the legacy algorithm when selecting vttablets for serving. (default true)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--lock_heartbeat_time duration If there is lock function used. This will keep the lock connection active by using this heartbeat (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
Expand Down Expand Up @@ -136,7 +137,7 @@ Usage of vtgate:
--mysql_tcp_version string Select tcp, tcp4, or tcp6 to control the socket type. (default "tcp")
--no_scatter when set to true, the planner will fail instead of producing a plan that includes scatter queries
--normalize_queries Rewrite queries with bind vars. Turn this off if the app itself sends normalized queries with bind vars. (default true)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--opentsdb_uri string URI of opentsdb /api/put method
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
Expand All @@ -150,7 +151,7 @@ Usage of vtgate:
--querylog-format string format for query logs ("text" or "json") (default "text")
--querylog-row-threshold uint Number of rows a query has to return or affect before being logged; not useful for streaming queries. 0 means all queries will be logged.
--redact-debug-ui-queries redact full queries and bind variables from debug UI
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--retry-count int retry count (default 2)
--schema_change_signal Enable the schema tracker; requires queryserver-config-schema-change-signal to be enabled on the underlying vttablets for this to work (default true)
--schema_change_signal_user string User to be used to send down query to vttablet to retrieve schema changes
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtgr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Usage of vtgr:
-h, --help display usage and exit
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand All @@ -31,7 +32,7 @@ Usage of vtgr:
--pprof strings enable profiling
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--refresh_interval duration Refresh interval to load tablets. (default 10s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--scan_interval duration Scan interval to diagnose and repair. (default 3s)
--scan_repair_timeout duration Time to wait for a Diagnose and repair operation. (default 3s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
Expand Down
6 changes: 3 additions & 3 deletions go/flags/endtoend/vtorc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ Usage of vtorc:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-shard-timeout duration Duration for which a shard lock is held when running a recovery (default 30s)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--port int port for the server
Expand All @@ -39,7 +39,7 @@ Usage of vtorc:
--reasonable-replication-lag duration Maximum replication lag on replicas which is deemed to be acceptable (default 10s)
--recovery-period-block-duration duration Duration for which a new recovery is blocked on an instance after running a recovery (default 30s)
--recovery-poll-duration duration Timer duration on which VTOrc polls its database to run a recovery (default 1s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
--shutdown_wait_time duration Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s)
--snapshot-topology-interval duration Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours
Expand Down
5 changes: 3 additions & 2 deletions go/flags/endtoend/vttablet.txt
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ Usage of vttablet:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--lock_tables_timeout duration How long to keep the table locked before timing out (default 1m0s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
Expand Down Expand Up @@ -191,7 +192,7 @@ Usage of vttablet:
--mysql_server_version string MySQL server version to advertise.
--mysqlctl_mycnf_template string template file to use for generating the my.cnf file during server init
--mysqlctl_socket string socket file to use for remote mysqlctl actions (empty for local actions)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--opentsdb_uri string URI of opentsdb /api/put method
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
Expand Down Expand Up @@ -246,7 +247,7 @@ Usage of vttablet:
--redact-debug-ui-queries redact full queries and bind variables from debug UI
--relay_log_max_items int Maximum number of rows for VReplication target buffering. (default 5000)
--relay_log_max_size int Maximum buffer size (in bytes) for VReplication target buffering. If single rows are larger than this, a single row is buffered at a time. (default 250000)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--replication_connect_retry duration how long to wait in between replica reconnect attempts. Only precise to the second. (default 10s)
--restore_concurrency int (init restore parameter) how many concurrent files to restore at once (default 4)
--restore_from_backup (init restore parameter) will check BackupStorage for a recent backup at startup and start there
Expand Down
Loading

0 comments on commit aa327fe

Please sign in to comment.