-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail restore if warmup fails only during check-wal-only
strategy
#5621
Fail restore if warmup fails only during check-wal-only
strategy
#5621
Conversation
ff5c067
to
67bee73
Compare
67bee73
to
e93019b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
@YuJuncen: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test-pull-e2e-kind-br |
Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BornChanger, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
New changes are detected. LGTM label has been removed. |
/cherry-pick release-1.5 |
@csuzhangxc: new pull request created to branch In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
#5569 updates volume-snapshot restore process to fail the entire restore if any warmup job failed. We use this to quickly check the viability of restores and terminate restore processing early if a corruption is detected.
#5572 updates volume-snapshot restore process to enable recovery from a corruption in a single TiKV through manual cluster operations. We use this in a full restore in case we encounter a corruption during this process.
These features are in conflict w/ each other. If we want to perform a full restore and use single TiKV recovery in the event of corruption, we cannot fail the restore during warmup and instead need to complete warmup stage and progress to restarting TiKVs. If we only want to check the viability of a restore, we are ok w/ failing the restore and not progressing to any further steps. Thus, we gate this failure behavior only behind the
check-wal-only
strategy.What is changed and how does it work?
Gate restore failure on warmup failure only for
check-wal-only
warmup strategy.Code changes
Tests
Side effects
Related changes
Release Notes
Please refer to Release Notes Language Style Guide before writing the release note.