-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tikv: collapse duplicate resolve locks in region requests #16838
Conversation
Codecov Report
|
/run-all-tests |
/build |
store/tikv/client_collapse.go
Outdated
case tikvrpc.CmdResolveLock: | ||
resolveLock := req.ResolveLock() | ||
if len(resolveLock.Keys) > 0 { | ||
// can not collapse resolveLite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolveLock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve lock lite
store/tikv/client_collapse.go
Outdated
return | ||
} | ||
canCollapse = true | ||
key := addr + "-" + strconv.FormatUint(resolveLock.StartVersion, 10) + "-" + strconv.FormatUint(resolveLock.CommitVersion, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key doesn't have the region id.
And can we remove the store address from the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start ts is already unique, we don't need to include commit version in the key.
Please address comment, rest LGTM |
/run-all-tests |
LGTM @coocood |
store/tikv/client_collapse.go
Outdated
func (r reqCollapse) collapse(ctx context.Context, key string, sf *singleflight.Group, | ||
addr string, req *tikvrpc.Request, timeout time.Duration) (resp *tikvrpc.Response, err error) { | ||
rsC := sf.DoChan(key, func() (interface{}, error) { | ||
return r.Client.SendRequest(context.Background(), addr, req, ReadTimeoutMedium) // set longer timeout than outer's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set longer timeout than outer's?
LGTM |
/merge |
/run-all-tests |
cherry pick to release-4.0 in PR #16925 |
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
When txn1, txn2 meet txn1's lock, and txn1 is large txn that modify many keys
both txn1, txn2 will try to resolve(after check txn status) the whole region even if txn1 meet lock at k1 and txn2 meet lock at k2.
we can deduplicate resolve requests that try to resolve the whole region.
What is changed and how it works?
Proposal: xxx
What's Changed:
At first glance, we should do it in LockResolver, but the question is it's hard to maintain backoff behavior at LockResolver level:
if we collapse multiple lock requests into one request at LockResolver level, working request maybe meet many error and backoff(e.g. regionMiss, notLeader, kvRpc...), we are hard to let those backoff info feedback to collapsed requests that waiting for working request.
so this PR chooses to do it in lower level ---- at TiClient that this level, we only need to take care context.Cancel and timeout.
How it Works:
collapse multi resolve request into one request using singleflight.
and chose
DoChan
to impl request level timeout and context.CancelRelated changes
Check List
Tests
Side effects
Release note
collapse duplicate resolve locks in region requests
This change is