Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: add resolve locks interface for tidb gc_worker #945

Merged
merged 8 commits into from
Aug 29, 2023

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented Aug 15, 2023

This PR is partial of pingcap/tidb#45904.

It tried to abstract a new inteface of gc lock resolver for GCWorker and log backup.

@3pointer 3pointer changed the title Public resolve locks gc: add resolve locks interface for tidb gc_worker Aug 17, 2023
@3pointer 3pointer marked this pull request as ready for review August 17, 2023 06:11
tikv/gc.go Outdated
handler := func(ctx context.Context, r kv.KeyRange) (rangetask.TaskStat, error) {
return s.resolveLocksForRange(ctx, safePoint, r.StartKey, r.EndKey)
return ResolveLocksForRange(ctx, "GC", lockResolver, safePoint, r.StartKey, r.EndKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the second parameter is named uuid, it doesn't make sense to pass "GC" to it. As there isn't a GCWorker here, consider generating one.
Actually I think we don't need to keep uuid anyway. For example, we can call it identifier, and pass it "gc-client-go-api" here, and pass "gcworker-{uuid}" from TiDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

tikv/gc.go Outdated
}
}

func (l *BaseLockResolver) ResolveLocks(ctx context.Context, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The name BaseLockResolver looks confused with those stuff in lock_resovler.go. Consider distinguish them by naming this BaseGCLockResolver. (and btw what does Base mean here?)
  • The function ResolveLocks looks also confused with those methods in lock_resolver.go, which is used by transactions that meet stale locks. BatchResolveLocks, which is used for GC, has some differences with ResolveLocks. Here I suggest you to consider still using the name BatchResolveLocks, or more clearly, using somethig like ResolveLocksForGC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TiDB GC Worker. it use BaseLockResolver to scan and resolve locks.
For TiDB Advacner(log backup). it use BaseLockResolver to scan locks and override ResolveLocks to handle locks with a small region range and check txn status one by one.(which is not BatchResolveLocks, force delete locks before safepoint)

So, BaseGCLockResolver is ok for me. and BatchResolveLocks seems not clear.

tikv/gc.go Outdated

func (l *BaseLockResolver) ScanLocks(ctx context.Context, key []byte, maxVersion uint64) ([]*txnlock.Lock, *locate.KeyLocation, error) {
// every time ScanLocks need a new backoffer
l.bo = NewGcResolveLockMaxBackoffer(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about providing the backoff from the caller?
I'm afraid it might easily cause misusing to making it a state of the lock resolver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the origin logic is ScanLocks and ResolveLocks must use backoff and must share one backoff, and gc worker test using failpoint to control the backoff.
So I think backoff is an internal logic for GCWorker and the test doesn't need it.

If we public it, then it should be caller's responsibility to call SetBackoffer and GetBackoffer when using ScanLocks and ResolveLocks for LockResolver? and caller may not know they need share the same backoff. Do you think it's still ok?

tikv/gc.go Outdated
}
// resolve locks failed since the locks are not in one region anymore, need retry.
if resolvedLocation == nil {
continue
}
if len(locks) < gcScanLockLimit {
if len(locks) < GCScanLockLimit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is going to be used by TiDB, please also see if TiDB has its own settings of the scan limit. If so, consider making it customizable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiDB has a mergeLockScanner which use gcScanLockLimit to scan locks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both use the same setting 128. I think we can unified here

Signed-off-by: 3pointer <luancheng@pingcap.com>
tikv/gc.go Outdated
locks, loc, err := s.scanLocksInRegionWithStartKey(bo, key, safePoint, gcScanLockLimit)
// create new backoffer for every scan and resolve locks
bo := createBackoffFn(ctx)
locks, loc, err := resolver.ScanLocks(bo, key, maxVersion, scanLimit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiDB has this test code:

		if w.testingKnobs.scanLocks != nil {
			locks = append(locks, w.testingKnobs.scanLocks(key, loc.Region.GetID(), tryResolveLocksTS)...)
		}

This is used to inject some simulated locks for tests. Did you notice it, and do you have any idea how to support that kind of tests after the change?
This

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,In TiDB test a new test struct will override ScanLock method to generate expected locks.

tikv/gc.go Outdated
}

func (l *BaseLockResolver) ResolveLocks(bo *Backoffer, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) {
return batchResolveLocksInARegion(bo, l.GetStore(), locks, loc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function batchResolveLocksInARegion only resolves locks in the single region specified by loc. If error happens, it may reload the region info, but still only handles the first region containing the key locks[0]. If locks locates in more than one region, the remaining will be ignored. This works in the GC logic (or the scanlocks-resolvelocks for range logic) because it guarantees locks passed to this function is sorted, and the returned KeyLocation indicates the next key to scan the next batch, so no lock will be missed. However if you public this function to be accessible everywhere, it's very weird that it doesn't promise to completely handle the given locks and returns a KeyLocation, and is very very very likely to cause misusing. Please consider adjust the implementation to suite the interface's semantics or any other way to avoid the problem.

Signed-off-by: 3pointer <luancheng@pingcap.com>
@3pointer
Copy link
Contributor Author

/retest

Signed-off-by: 3pointer <luancheng@pingcap.com>
Comment on lines 185 to 187
if resolvedLocation == nil {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the backoffer is created outside the loop and recreated whenever the code in loop is successfully. Now you made the backoffer recreated every time in the loop, and as a result when this continue is executed, the backoffer is not counted as expected. Please consider keeping the old logic about backoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. good catch.

//
// regionLocation should return if resolve locks succeed. if regionLocation return nil,
// which means not all locks are resolved in someway. the caller should retry scan locks.
ResolveLocksInOneRegion(bo *Backoffer, locks []*txnlock.Lock, regionLocation *locate.KeyLocation) (*locate.KeyLocation, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please emphasize in comments that locks is assumed to be sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. PTAL

Signed-off-by: 3pointer <luancheng@pingcap.com>
Copy link
Contributor

@zyguan zyguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MyonKeminta MyonKeminta merged commit a8860a9 into tikv:master Aug 29, 2023
10 checks passed
3pointer added a commit to 3pointer/client-go that referenced this pull request Sep 11, 2023
* gc: add GCResolver inteface for resolve locks

Signed-off-by: 3pointer <luancheng@pingcap.com>

* adapt scanlimit

Signed-off-by: 3pointer <luancheng@pingcap.com>

* rename GCLockResolver to RegionLockResolver

Signed-off-by: 3pointer <luancheng@pingcap.com>

* update

Signed-off-by: 3pointer <luancheng@pingcap.com>

* address comments

Signed-off-by: 3pointer <luancheng@pingcap.com>

---------

Signed-off-by: 3pointer <luancheng@pingcap.com>
3pointer added a commit to 3pointer/client-go that referenced this pull request Sep 11, 2023
* gc: add GCResolver inteface for resolve locks

Signed-off-by: 3pointer <luancheng@pingcap.com>

* adapt scanlimit

Signed-off-by: 3pointer <luancheng@pingcap.com>

* rename GCLockResolver to RegionLockResolver

Signed-off-by: 3pointer <luancheng@pingcap.com>

* update

Signed-off-by: 3pointer <luancheng@pingcap.com>

* address comments

Signed-off-by: 3pointer <luancheng@pingcap.com>

---------

Signed-off-by: 3pointer <luancheng@pingcap.com>
MyonKeminta pushed a commit that referenced this pull request Sep 12, 2023
* gc: add GCResolver inteface for resolve locks



* adapt scanlimit



* rename GCLockResolver to RegionLockResolver



* update



* address comments



---------

Signed-off-by: 3pointer <luancheng@pingcap.com>
disksing pushed a commit that referenced this pull request Sep 18, 2023
Signed-off-by: 3pointer <luancheng@pingcap.com>
iosmanthus added a commit that referenced this pull request Dec 20, 2023
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: zzm <zhouzemin@pingcap.com>
Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: you06 <you1474600@gmail.com>
Co-authored-by: buffer <doufuxiaowangzi@gmail.com>
Co-authored-by: 3pointer <qdlc2010@gmail.com>
Co-authored-by: buffer <1045931706@qq.com>
Co-authored-by: husharp <ihusharp@gmail.com>
Co-authored-by: crazycs520 <crazycs520@gmail.com>
Co-authored-by: Smilencer <smityz@qq.com>
Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: zyguan <zhongyangguan@gmail.com>
Co-authored-by: Jack Yu <jackysp@gmail.com>
Co-authored-by: Weizhen Wang <wangweizhen@pingcap.com>
Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
Co-authored-by: healthwaite <148101100+healthwaite@users.noreply.github.com>
Co-authored-by: xufei <xufeixw@mail.ustc.edu.cn>
Co-authored-by: JmPotato <ghzpotato@gmail.com>
Co-authored-by: ekexium <eke@fastmail.com>
Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Co-authored-by: glorv <glorvs@163.com>
Co-authored-by: Yongbo Jiang <cabinfeveroier@gmail.com>
resolve locks interface for tidb gc_worker (#945)
fix some issues of replica selector (#910)  (#942)
fix some issues of replica selector (#910)
fix issue of configure kv timeout not work when disable batch client (#980)
fix batch-client wait too long and add some metrics (#973)
fix batch-client wait too long and add some metrics (#973)" (#984)
fix data race at the aggressiveLockingDirty (#913)
fix MinSafeTS might be set to MaxUint64 permanently (#994)
fix: fix invalid nil pointer when trying to record Store.SlownessStat. (#1017)
Fix batch client batchSendLoop panic (#1021)
fix request source tag unset (#1025)
Fix comment of `SuspendTime` (#1057)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants