-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: add resolve locks interface for tidb gc_worker #945
Conversation
ccb3015
to
abb612c
Compare
tikv/gc.go
Outdated
handler := func(ctx context.Context, r kv.KeyRange) (rangetask.TaskStat, error) { | ||
return s.resolveLocksForRange(ctx, safePoint, r.StartKey, r.EndKey) | ||
return ResolveLocksForRange(ctx, "GC", lockResolver, safePoint, r.StartKey, r.EndKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the second parameter is named uuid
, it doesn't make sense to pass "GC"
to it. As there isn't a GCWorker here, consider generating one.
Actually I think we don't need to keep uuid
anyway. For example, we can call it identifier
, and pass it "gc-client-go-api"
here, and pass "gcworker-{uuid}"
from TiDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
tikv/gc.go
Outdated
} | ||
} | ||
|
||
func (l *BaseLockResolver) ResolveLocks(ctx context.Context, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The name
BaseLockResolver
looks confused with those stuff inlock_resovler.go
. Consider distinguish them by naming thisBaseGCLockResolver
. (and btw what doesBase
mean here?) - The function
ResolveLocks
looks also confused with those methods inlock_resolver.go
, which is used by transactions that meet stale locks.BatchResolveLocks
, which is used for GC, has some differences withResolveLocks
. Here I suggest you to consider still using the nameBatchResolveLocks
, or more clearly, using somethig likeResolveLocksForGC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TiDB GC Worker. it use BaseLockResolver to scan and resolve locks.
For TiDB Advacner(log backup). it use BaseLockResolver to scan locks and override ResolveLocks
to handle locks with a small region range and check txn status one by one.(which is not BatchResolveLocks, force delete locks before safepoint)
So, BaseGCLockResolver is ok for me. and BatchResolveLocks
seems not clear.
tikv/gc.go
Outdated
|
||
func (l *BaseLockResolver) ScanLocks(ctx context.Context, key []byte, maxVersion uint64) ([]*txnlock.Lock, *locate.KeyLocation, error) { | ||
// every time ScanLocks need a new backoffer | ||
l.bo = NewGcResolveLockMaxBackoffer(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about providing the backoff from the caller?
I'm afraid it might easily cause misusing to making it a state of the lock resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the origin logic is ScanLocks
and ResolveLocks
must use backoff and must share one backoff, and gc worker test using failpoint to control the backoff.
So I think backoff is an internal logic for GCWorker and the test doesn't need it.
If we public it, then it should be caller's responsibility to call SetBackoffer
and GetBackoffer
when using ScanLocks
and ResolveLocks
for LockResolver? and caller may not know they need share the same backoff. Do you think it's still ok?
tikv/gc.go
Outdated
} | ||
// resolve locks failed since the locks are not in one region anymore, need retry. | ||
if resolvedLocation == nil { | ||
continue | ||
} | ||
if len(locks) < gcScanLockLimit { | ||
if len(locks) < GCScanLockLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this function is going to be used by TiDB, please also see if TiDB has its own settings of the scan limit. If so, consider making it customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiDB has a mergeLockScanner
which use gcScanLockLimit
to scan locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They both use the same setting 128. I think we can unified here
Signed-off-by: 3pointer <luancheng@pingcap.com>
c6121f8
to
e8dd618
Compare
Signed-off-by: 3pointer <luancheng@pingcap.com>
…ient-go into public_resolve_locks
tikv/gc.go
Outdated
locks, loc, err := s.scanLocksInRegionWithStartKey(bo, key, safePoint, gcScanLockLimit) | ||
// create new backoffer for every scan and resolve locks | ||
bo := createBackoffFn(ctx) | ||
locks, loc, err := resolver.ScanLocks(bo, key, maxVersion, scanLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiDB has this test code:
if w.testingKnobs.scanLocks != nil {
locks = append(locks, w.testingKnobs.scanLocks(key, loc.Region.GetID(), tryResolveLocksTS)...)
}
This is used to inject some simulated locks for tests. Did you notice it, and do you have any idea how to support that kind of tests after the change?
This
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,In TiDB test a new test struct will override ScanLock
method to generate expected locks.
tikv/gc.go
Outdated
} | ||
|
||
func (l *BaseLockResolver) ResolveLocks(bo *Backoffer, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) { | ||
return batchResolveLocksInARegion(bo, l.GetStore(), locks, loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function batchResolveLocksInARegion
only resolves locks in the single region specified by loc
. If error happens, it may reload the region info, but still only handles the first region containing the key locks[0]
. If locks
locates in more than one region, the remaining will be ignored. This works in the GC logic (or the scanlocks-resolvelocks for range logic) because it guarantees locks
passed to this function is sorted, and the returned KeyLocation
indicates the next key to scan the next batch, so no lock will be missed. However if you public this function to be accessible everywhere, it's very weird that it doesn't promise to completely handle the given locks and returns a KeyLocation
, and is very very very likely to cause misusing. Please consider adjust the implementation to suite the interface's semantics or any other way to avoid the problem.
2fdaed9
to
77caf52
Compare
Signed-off-by: 3pointer <luancheng@pingcap.com>
77caf52
to
b2f7321
Compare
/retest |
if resolvedLocation == nil { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the backoffer is created outside the loop and recreated whenever the code in loop is successfully. Now you made the backoffer recreated every time in the loop, and as a result when this continue
is executed, the backoffer is not counted as expected. Please consider keeping the old logic about backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. good catch.
// | ||
// regionLocation should return if resolve locks succeed. if regionLocation return nil, | ||
// which means not all locks are resolved in someway. the caller should retry scan locks. | ||
ResolveLocksInOneRegion(bo *Backoffer, locks []*txnlock.Lock, regionLocation *locate.KeyLocation) (*locate.KeyLocation, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please emphasize in comments that locks
is assumed to be sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. PTAL
Signed-off-by: 3pointer <luancheng@pingcap.com>
091467b
to
fe522b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* gc: add GCResolver inteface for resolve locks Signed-off-by: 3pointer <luancheng@pingcap.com> * adapt scanlimit Signed-off-by: 3pointer <luancheng@pingcap.com> * rename GCLockResolver to RegionLockResolver Signed-off-by: 3pointer <luancheng@pingcap.com> * update Signed-off-by: 3pointer <luancheng@pingcap.com> * address comments Signed-off-by: 3pointer <luancheng@pingcap.com> --------- Signed-off-by: 3pointer <luancheng@pingcap.com>
* gc: add GCResolver inteface for resolve locks Signed-off-by: 3pointer <luancheng@pingcap.com> * adapt scanlimit Signed-off-by: 3pointer <luancheng@pingcap.com> * rename GCLockResolver to RegionLockResolver Signed-off-by: 3pointer <luancheng@pingcap.com> * update Signed-off-by: 3pointer <luancheng@pingcap.com> * address comments Signed-off-by: 3pointer <luancheng@pingcap.com> --------- Signed-off-by: 3pointer <luancheng@pingcap.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: husharp <jinhao.hu@pingcap.com> Co-authored-by: you06 <you1474600@gmail.com> Co-authored-by: buffer <doufuxiaowangzi@gmail.com> Co-authored-by: 3pointer <qdlc2010@gmail.com> Co-authored-by: buffer <1045931706@qq.com> Co-authored-by: husharp <ihusharp@gmail.com> Co-authored-by: crazycs520 <crazycs520@gmail.com> Co-authored-by: Smilencer <smityz@qq.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: zyguan <zhongyangguan@gmail.com> Co-authored-by: Jack Yu <jackysp@gmail.com> Co-authored-by: Weizhen Wang <wangweizhen@pingcap.com> Co-authored-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: healthwaite <148101100+healthwaite@users.noreply.github.com> Co-authored-by: xufei <xufeixw@mail.ustc.edu.cn> Co-authored-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ekexium <eke@fastmail.com> Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com> Co-authored-by: glorv <glorvs@163.com> Co-authored-by: Yongbo Jiang <cabinfeveroier@gmail.com> resolve locks interface for tidb gc_worker (#945) fix some issues of replica selector (#910) (#942) fix some issues of replica selector (#910) fix issue of configure kv timeout not work when disable batch client (#980) fix batch-client wait too long and add some metrics (#973) fix batch-client wait too long and add some metrics (#973)" (#984) fix data race at the aggressiveLockingDirty (#913) fix MinSafeTS might be set to MaxUint64 permanently (#994) fix: fix invalid nil pointer when trying to record Store.SlownessStat. (#1017) Fix batch client batchSendLoop panic (#1021) fix request source tag unset (#1025) Fix comment of `SuspendTime` (#1057)
This PR is partial of pingcap/tidb#45904.
It tried to abstract a new inteface of gc lock resolver for GCWorker and log backup.