-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/tikv:Ignore error and do gc anyway #5797
Conversation
store/tikv/gcworker/gc_worker.go
Outdated
regions++ | ||
} else { | ||
errRegions++ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is errRegions
need ++ when err != nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
store/tikv/gcworker/gc_worker.go
Outdated
break | ||
} | ||
} | ||
log.Infof("[gc worker] %s finish gc, safePoint: %v, regions: %v, ignored regions: %v, cost time: %s", identifier, safePoint, regions, errRegions, time.Since(startTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can output this line of log at regular intervals (1min for example) during gc, so that we can know the GC progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -148,7 +148,7 @@ const ( | |||
getMaxBackoff = 20000 | |||
prewriteMaxBackoff = 20000 | |||
cleanupMaxBackoff = 20000 | |||
GcMaxBackoff = 100000 | |||
GcOneRegionMaxBackoff = 20000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think gc command can definitely returns in 20 secs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will change it for one request
store/tikv/gcworker/gc_worker.go
Outdated
return nil | ||
} | ||
|
||
func doGCForOneRegion(ctx goctx.Context, store tikv.Storage, safePoint uint64, identifier string, startKey []byte, endKey []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use loc instead of startKey and endKey. By the way, you don't need endKey indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will use loc
, for endKey
, see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func doGCForOneRegion(ctx goctx.Context, store tikv.Storage, safePoint uint64, identifier string, region RegionVerID) (RegionError, error)
store/tikv/gcworker/gc_worker.go
Outdated
key = loc.EndKey | ||
if len(key) == 0 { | ||
if len(key) == 0 || bytes.Compare(key, endKey) >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think L596-L599 should exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the func doGCForOneRegion
used to cleanup one region, if we remove this condition, the func is confused.
store/tikv/gcworker/gc_worker.go
Outdated
|
||
var key []byte | ||
bo := tikv.NewBackoffer(tikv.GcOneRegionMaxBackoff, goctx.Background()) | ||
key := startKey | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the loop.
store/tikv/gcworker/gc_worker.go
Outdated
return nil | ||
} | ||
|
||
func doGCForOneRegion(ctx goctx.Context, store tikv.Storage, safePoint uint64, identifier string, startKey []byte, endKey []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func doGCForOneRegion(ctx goctx.Context, store tikv.Storage, safePoint uint64, identifier string, region RegionVerID) (RegionError, error)
store/tikv/gcworker/gc_worker.go
Outdated
return errors.Trace(err) | ||
} | ||
errRegions++ | ||
} else if regionErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if regionErr != nil {
backoff
continue
}
if err != nil {
} else {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to check err first, because if regionErr
and err
both exist, err
should have higher priority.
store/tikv/gcworker/gc_worker.go
Outdated
ticker := time.NewTicker(gcJobLogTickInterval) | ||
defer ticker.Stop() | ||
|
||
bo := tikv.NewBackoffer(tikv.GcLocalRegionMaxBackoff, goctx.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does LocalRegionMaxBackoff
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it
store/tikv/gcworker/gc_worker.go
Outdated
gcHistogram.WithLabelValues("do_gc").Observe(time.Since(startTime).Seconds()) | ||
return nil | ||
} | ||
|
||
func doGCForOneRegion(store tikv.Storage, safePoint uint64, loc *tikv.KeyLocation) (*errorpb.Error, error) { | ||
req := &tikvrpc.Request{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you only need RegionVerID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
store/tikv/gcworker/gc_worker.go
Outdated
if gcResp.GetError() != nil { | ||
return errors.Errorf("unexpected gc error: %s", gcResp.GetError()) | ||
} else { | ||
regions++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil {
err++
}
if regionError != nil {
backoff
continue
}
if err == nil {
++}
PTAL @zhangjinpeng1987 |
store/tikv/gcworker/gc_worker.go
Outdated
@@ -515,63 +516,95 @@ func doGC(ctx goctx.Context, store tikv.Storage, safePoint uint64, identifier st | |||
// Sleep to wait for all other tidb instances update their safepoint cache. | |||
time.Sleep(gcSafePointCacheInterval) | |||
|
|||
log.Infof("[gc worker] %s start gc, safePoint: %v.", identifier, safePoint) | |||
startTime := time.Now() | |||
regions := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/regions/successRegions
s/errRegions/failedRegions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
/run-all-tests |
/run-all-tests |
Please add tests. |
store/tikv/gcworker/gc_worker.go
Outdated
|
||
bo := tikv.NewBackoffer(tikv.GcOneRegionMaxBackoff, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize it in following for loop seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to do because It should not be initialized when we meet region error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
store/tikv/gcworker/gc_worker.go
Outdated
gcHistogram.WithLabelValues("do_gc").Observe(time.Since(startTime).Seconds()) | ||
return nil | ||
} | ||
|
||
// these two errors should not return together, for more, see the func 'doGC' | ||
func doGCForOneRegion(bo *tikv.Backoffer, store tikv.Storage, req *tikvrpc.Request, region tikv.RegionVerID) (*errorpb.Error, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we should move the construction of req
into this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the completion, it should be move in here, but if we have 1M regions, this cost very large extra memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost is acceptable. If i pass an request not gc, the behavior is not consistent with the function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will change it
…idb into ignore_error_and_do_gc_anyway
PTAL @zhangjinpeng1987 |
@ngaut |
Please use gofail to simulate error. |
LGTM. |
LGTM |
log.Infof("[gc worker] %s start gc, safePoint: %v.", identifier, safePoint) | ||
startTime := time.Now() | ||
regions := 0 | ||
successRegions := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add some metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@ngaut ok, I will do it |
@@ -222,7 +224,7 @@ func (w *GCWorker) leaderTick(ctx goctx.Context) error { | |||
} | |||
|
|||
w.gcIsRunning = true | |||
log.Infof("[gc worker] %s starts GC job, safePoint: %v", w.uuid, safePoint) | |||
log.Infof("[gc worker] %s starts the whole job, safePoint: %v", w.uuid, safePoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change the log because it is too similar to the log in the func doGC
.
PTAL @ngaut |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* ignore error when gc, and continue do gc next region
as title says