-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
domain: use with kv timeout feature schema load kv timeout #48017
domain: use with kv timeout feature schema load kv timeout #48017
Conversation
Hi @cfzjywxk. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -184,6 +184,7 @@ func (do *Domain) loadInfoSchema(startTS uint64) (infoschema.InfoSchema, bool, i | |||
loadSchemaDurationTotal.Observe(time.Since(beginTime).Seconds()) | |||
}() | |||
snapshot := do.store.GetSnapshot(kv.NewVersion(startTS)) | |||
snapshot.SetOption(kv.TiKVClientReadTimeout, uint64(3000)) // 3000ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value if not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have snapshot.SetOption(kv.ReplicaRead, kv.ReplicaReadMixed)
set as well, otherwise it won't retry on replica, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zimulala
The default value of Get
is 30s and BatchGet
is 60s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tema
If snapshot.SetOption(kv.ReplicaRead, kv.ReplicaReadMixed)
is not set, the leader peer would be tried first. If timeout happens the error is catched, the replica selector would be transferfred to TryFollower
automatically , then follower read would be used on the other peers.
So it's not necessary to involve follower read in the first try I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of Get is 30s and BatchGet is 60s.
If that is true, then retry does not go to replica. I've tested this branch without changes in this PR by setting lease=300s and it didn't help. If the default timeout is 30s then lease would be able to renew by falling back to heathy replica, which I didn't observe in the experiment. So the default timeout is much bigger, does not exists at all or retry does not go to replica for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cfzjywxk! I've tested this patch on our test bed and it works pretty well!
@Tema: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crazycs520, Tema, you06 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
da06a17
into
pingcap:tidb-6.5-with-kv-timeout-feature
What problem does this PR solve?
Issue Number: ref #48124
Problem Summary:
Use kv timeout feature for schema reload.
What is changed and how it works?
When the leader peer of meta region is slow, use kv timeout could help alleviated by enabling the KV read timeout, avoid errors like "schema lease expire". For example when injecting slowness into the meta region TiKV node,
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.