Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: collapse duplicate resolve locks in region requests #16838

Merged
merged 3 commits into from
Apr 29, 2020
Merged

tikv: collapse duplicate resolve locks in region requests #16838

merged 3 commits into from
Apr 29, 2020

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Apr 26, 2020

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

When txn1, txn2 meet txn1's lock, and txn1 is large txn that modify many keys

both txn1, txn2 will try to resolve(after check txn status) the whole region even if txn1 meet lock at k1 and txn2 meet lock at k2.

we can deduplicate resolve requests that try to resolve the whole region.

What is changed and how it works?

Proposal: xxx

What's Changed:

At first glance, we should do it in LockResolver, but the question is it's hard to maintain backoff behavior at LockResolver level:

if we collapse multiple lock requests into one request at LockResolver level, working request maybe meet many error and backoff(e.g. regionMiss, notLeader, kvRpc...), we are hard to let those backoff info feedback to collapsed requests that waiting for working request.

so this PR chooses to do it in lower level ---- at TiClient that this level, we only need to take care context.Cancel and timeout.

How it Works:

collapse multi resolve request into one request using singleflight.

and chose DoChan to impl request level timeout and context.Cancel

Related changes

  • Need to cherry-pick to the release branch(after test..)

Check List

Tests

  • Integration test(old test)

Side effects

  • reduce kv resolve request count

Release note

collapse duplicate resolve locks in region requests


This change is Reviewable

@lysu lysu added component/tikv needs-cherry-pick-4.0 type/enhancement The issue or PR belongs to an enhancement. labels Apr 26, 2020
@lysu lysu requested review from tiancaiamao, alexc711, cfzjywxk and coocood and removed request for alexc711 April 26, 2020 08:57
@codecov
Copy link

codecov bot commented Apr 26, 2020

Codecov Report

❗ No coverage uploaded for pull request head (dev-dedup-resolve-region@b07495d). Click here to learn what that means.
The diff coverage is n/a.

@lysu
Copy link
Contributor Author

lysu commented Apr 26, 2020

/run-all-tests

@lysu
Copy link
Contributor Author

lysu commented Apr 26, 2020

/build

case tikvrpc.CmdResolveLock:
resolveLock := req.ResolveLock()
if len(resolveLock.Keys) > 0 {
// can not collapse resolveLite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveLock?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve lock lite

return
}
canCollapse = true
key := addr + "-" + strconv.FormatUint(resolveLock.StartVersion, 10) + "-" + strconv.FormatUint(resolveLock.CommitVersion, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key doesn't have the region id.
And can we remove the store address from the key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start ts is already unique, we don't need to include commit version in the key.

@tiancaiamao
Copy link
Contributor

Please address comment, rest LGTM

@tiancaiamao tiancaiamao added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 27, 2020
@lysu
Copy link
Contributor Author

lysu commented Apr 28, 2020

/run-all-tests

@tiancaiamao
Copy link
Contributor

LGTM @coocood

func (r reqCollapse) collapse(ctx context.Context, key string, sf *singleflight.Group,
addr string, req *tikvrpc.Request, timeout time.Duration) (resp *tikvrpc.Response, err error) {
rsC := sf.DoChan(key, func() (interface{}, error) {
return r.Client.SendRequest(context.Background(), addr, req, ReadTimeoutMedium) // set longer timeout than outer's.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set longer timeout than outer's?

@coocood
Copy link
Member

coocood commented Apr 29, 2020

LGTM

@coocood
Copy link
Member

coocood commented Apr 29, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 29, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Apr 29, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 29, 2020

cherry pick to release-4.0 in PR #16925

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants