-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout Fixes and VTOrc Improvement #11881
Changes from 21 commits
31ba4d4
b9417b8
def9aec
0273581
6c449c0
d288571
e1b23da
c2b6a0d
033083c
5f03e90
c2aa9fd
2097817
27bf296
e3790c9
97c7eef
e31e7c5
ac83323
6b0db02
1f1e8ac
2411cea
ea18935
740f2fa
24a4f40
d827acc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,5 +1,5 @@ | ||||||
Usage of vtctld: | ||||||
--action_timeout duration time to wait for an action before resorting to force (default 2m0s) | ||||||
--action_timeout duration time to wait for an action before resorting to force (default 1m0s) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this changing? Can you also check the blame so that we can see whether it changed recently? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a effect of vitess/go/vt/wrangler/wrangler.go Line 39 in 8b8b135
vitess/go/vt/vtctld/action_repository.go Line 38 in 8b8b135
Apparently there are two action_timeouts 🤣. That action_timeout is a parameter for vtctldclient and vtctlclient. This one is taken by vtctld and applies only to |
||||||
--alsologtostderr log to standard error as well as files | ||||||
--azblob_backup_account_key_file string Path to a file containing the Azure Storage account key; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_KEY will be used as the key itself (NOT a file path). | ||||||
--azblob_backup_account_name string Azure Storage Account name for backups; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_NAME will be used. | ||||||
|
@@ -60,21 +60,22 @@ Usage of vtctld: | |||||
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever) | ||||||
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever) | ||||||
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms) | ||||||
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s) | ||||||
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0) | ||||||
--log_dir string If non-empty, write log files in this directory | ||||||
--log_err_stacks log stack traces for errors | ||||||
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800) | ||||||
--logtostderr log to standard error instead of files | ||||||
--max-stack-size int configure the maximum stack size in bytes (default 67108864) | ||||||
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns) | ||||||
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s) | ||||||
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s) | ||||||
--opentsdb_uri string URI of opentsdb /api/put method | ||||||
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown. | ||||||
--port int port for the server | ||||||
--pprof strings enable profiling | ||||||
--proxy_tablets Setting this true will make vtctld proxy the tablet status instead of redirecting to them | ||||||
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s) | ||||||
--remote_operation_timeout duration time to wait for a remote operation (default 30s) | ||||||
--remote_operation_timeout duration time to wait for a remote operation (default 15s) | ||||||
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided). | ||||||
--s3_backup_aws_region string AWS region to use. (default "us-east-1") | ||||||
--s3_backup_aws_retries int AWS request retries. (default -1) | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,14 +23,14 @@ Usage of vtorc: | |
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever) | ||
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever) | ||
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms) | ||
--lock-shard-timeout duration Duration for which a shard lock is held when running a recovery (default 30s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to deprecate this flag? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have deprecated it -
Deprecated flags don't show up in the flags output either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this to the summary too. |
||
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s) | ||
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0) | ||
--log_dir string If non-empty, write log files in this directory | ||
--log_err_stacks log stack traces for errors | ||
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800) | ||
--logtostderr log to standard error instead of files | ||
--max-stack-size int configure the maximum stack size in bytes (default 67108864) | ||
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns) | ||
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s) | ||
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s) | ||
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown. | ||
--port int port for the server | ||
|
@@ -40,7 +40,7 @@ Usage of vtorc: | |
--reasonable-replication-lag duration Maximum replication lag on replicas which is deemed to be acceptable (default 10s) | ||
--recovery-period-block-duration duration Duration for which a new recovery is blocked on an instance after running a recovery (default 30s) | ||
--recovery-poll-duration duration Timer duration on which VTOrc polls its database to run a recovery (default 1s) | ||
--remote_operation_timeout duration time to wait for a remote operation (default 30s) | ||
--remote_operation_timeout duration time to wait for a remote operation (default 15s) | ||
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only) | ||
--shutdown_wait_time duration Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s) | ||
--snapshot-topology-interval duration Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying both remote_operation_timeout and lock-timeout to 30s will expose users to the bug that is being fixed in this PR.
Instead we should recommend that if (and only if) they get timeouts during PRS or ERS, they may need to increase remote_operation_timeout and lock-timeout at the same time while making sure that lock-timeout is at least 15 seconds more than remote_operation_timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the defaults are 15 and 45 seconds, and we don't want lock-timeout to be at least 15 seconds more than remote_operation_timeout, we want it to be atleaset 2 times it and a littlle more.
Well in the previous release this bug exists, and only after upgrading and specifying both the flags (or specifying neither) can they really get rid of the bug. So the recommended process is to specify the remote-operation_timeout first so that while upgrading their behaviour remains the same. Once ugpraded they can choose whatever values they want.