Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: retry executing sql without tiflash after tiflash is down #22459

Merged
merged 17 commits into from
Feb 5, 2021

Conversation

xuyifangreeneyes
Copy link
Contributor

@xuyifangreeneyes xuyifangreeneyes commented Jan 20, 2021

What problem does this PR solve?

Problem Summary:
When we get a backoff error of TiFlash, the SQL will always fail while the TiKV nodes are actually alive. We hope to fallback to TiKV after TiFlash is down.

What is changed and how it works?

Proposal: https://docs.google.com/document/d/1dgq1MZQigTcZeck0iRLSCAu_ly3iGDr19Et6IHfQhCs/edit?usp=sharing (This PR implements simple solution section)

What's Changed & How it Works:
When running clientConn.handleQuery / clientConn.handleStmtExecute, if we get ErrTiFlashServerTimeout error and the execution of SQL does't have side effect (If TiDB server already sends data to client or the execution is in cursor mode, the execution has side effect), we delete TiFlash from IsolationReadEngines, retry executing the SQL and add TiFlash back to IsolationReadEngines. Besides, we use a switch tidb_enable_tiflash_fallback_tikv to control whether to retry and the switch is off by default.

Related changes

Check List

Tests

  • Unit test
  • Integration test

Release note

  • fallback to TiKV if TiFlash's unavailability causes failure of executing SQL

@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Jan 20, 2021
@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes xuyifangreeneyes marked this pull request as ready for review January 25, 2021 13:25
@xuyifangreeneyes
Copy link
Contributor Author

/cc @winoros

@qw4990 qw4990 self-requested a review January 26, 2021 08:33
@qw4990 qw4990 added sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. and removed sig/sql-infra SIG: SQL Infra labels Jan 27, 2021
server/conn.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Jan 27, 2021
@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

server/conn.go Outdated
if err != nil {
break
if terror.ErrorEqual(err, tikv.ErrTiFlashServerTimeout) && retryable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to introduce a switch(generally it is implemented as a SQL variable) to control if we retry here.

server/conn.go Outdated Show resolved Hide resolved
@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes
Copy link
Contributor Author

/run-mybatis-test

@qw4990 qw4990 self-requested a review January 28, 2021 10:44
@qw4990
Copy link
Contributor

qw4990 commented Jan 28, 2021

PTAL @winoros

@xuyifangreeneyes xuyifangreeneyes requested a review from a team as a code owner January 28, 2021 11:50
@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes
Copy link
Contributor Author

@qw4990 @lysu PTAL

@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

@xuyifangreeneyes
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 5, 2021
@qw4990
Copy link
Contributor

qw4990 commented Feb 5, 2021

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 5, 2021
@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 22667
  • 22652

@ti-srebot
Copy link
Contributor

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution sig/planner SIG: Planner sig/sql-infra SIG: SQL Infra status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants