-
Notifications
You must be signed in to change notification settings - Fork 102
restore: Make all download error as retryable #298
Conversation
Codecov Report
@@ Coverage Diff @@
## master #298 +/- ##
=======================================
Coverage 71.80% 71.80%
=======================================
Files 48 48
Lines 5057 5057
=======================================
Hits 3631 3631
Misses 966 966
Partials 460 460 Continue to review full report at Codecov.
|
maybe backup also need to retry on upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(Accidentally verified on DBaaS that this works, after receiving a DNS error.)
|
Repeated testing using the 1.4 T data set did not reproduce
and catch tikv log
It seems fix the issue tikv/tikv#7846 . thx @kennytm |
fix tikv/tikv#7846 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
/run-all-tests |
cherry pick to release-3.1 failed |
cherry pick to release-4.0 in PR #307 |
What problem does this PR solve?
Workaround tikv/tikv#7846 and the "Restore" part of tidb-challenge-program/bug-hunting-issue#72.
What is changed and how it works?
Allow ErrDownloadFailed to be retried. The transient 5xx errors from the cloud server should be retryable.
This is currently too relaxed, as truly unrecoverable errors such as file-not-found will also get retried. But the worst case is retrying this for 8 times, so this seems to be an acceptable compromise.
We could tighten it back when we have more precise error handling from TiKV.
In the future we may tighten it back on truly recoverable errors such as file-not-found. but the worst of having a false negative is just hitting the same error 8 times.
Additionally, changed
WithRetry
to return amultierr
so no errors are lost.Check List
Tests
Code changes
Side effects
Related changes
Release Note