Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

region_cache: filter peers on tombstone or dropped stores #24726

Merged
merged 3 commits into from
May 21, 2021

Conversation

youjiali1995
Copy link
Contributor

@youjiali1995 youjiali1995 commented May 18, 2021

Signed-off-by: youjiali1995 zlwgx1023@gmail.com

What problem does this PR solve?

Issue Number: close #24648

Problem Summary:

TiDB doesn't handle tombstone or dropped stores correctly. It may report an error if there is a peer on such a store.

What is changed and how it works?

What's Changed:

  1. Add a tombstone state which means the store is a tombstone.
  2. Filter peers on tombstone or dropped stores and add a backoffer to Region.init().
  3. Fix data race between asyncCheckAndResolveLoop() and initResolve(). Now unresolved stores will only be resolved by initResolve().

How it Works:

Check List

Tests

  • Unit test

Side effects

Release note

  • No release note.

@youjiali1995 youjiali1995 added type/bugfix This PR fixes a bug. sig/transaction SIG:Transaction labels May 18, 2021
@ti-chi-bot ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 18, 2021
@youjiali1995 youjiali1995 requested review from lysu and sticnarf May 18, 2021 12:33
store/tikv/region_cache.go Outdated Show resolved Hide resolved
store/tikv/region_cache.go Outdated Show resolved Hide resolved
Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>
@youjiali1995 youjiali1995 force-pushed the handle-tombstone-store branch from a417b36 to 3cdd8ed Compare May 19, 2021 13:04
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 19, 2021
@lysu
Copy link
Contributor

lysu commented May 20, 2021

another question, what error should tikv return when a tikv be mark as tombstone- -?~ (or when should a running region will be triggered to find it's peer stores be marked as tombstone in PD

@youjiali1995
Copy link
Contributor Author

another question, what error should tikv return when a tikv be mark as tombstone- -?~

Normally, a tombstone store means there is no peer on the store, so if it is alive and receives a request, it will return RegionNotFound. But sometimes, there are still some peers on the tombstone store, for example, a store is down and the user forces to bury it or user forces to scale in a store, in such cases, the store is down and requests will fail.

@lysu
Copy link
Contributor

lysu commented May 20, 2021

/lgtm

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 20, 2021
@cfzjywxk cfzjywxk requested a review from ekexium May 20, 2021 08:00
@ekexium
Copy link
Contributor

ekexium commented May 20, 2021

/lgtm

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • ekexium
  • lysu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 20, 2021
store/tikv/region_cache.go Outdated Show resolved Hide resolved
Co-authored-by: Ziqian Qin <ekexium@gmail.com>
@youjiali1995
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 240d370

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 21, 2021
@youjiali1995
Copy link
Contributor Author

[2021-05-21T01:26:57.370Z] ----------------------------------------------------------------------
[2021-05-21T01:26:57.370Z] FAIL: prepare_test.go:995: testPrepareSerialSuite.TestPrepareCacheWithJoinTable
[2021-05-21T01:26:57.370Z] 
[2021-05-21T01:26:57.370Z] prepare_test.go:1022:
[2021-05-21T01:26:57.370Z]     ...
[2021-05-21T01:26:57.370Z] /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/util/testleak/leaktest.go:168:
[2021-05-21T01:26:57.370Z]     c.Errorf("Test %s check-count %d appears to have leaked: %v", c.TestName(), cnt, g)
[2021-05-21T01:26:57.370Z] ... Error: Test testPrepareSerialSuite.TestPrepareCacheWithJoinTable check-count 50 appears to have leaked: github.com/pingcap/tidb/ddl.(*worker).start(0xc1e6e196c0, 0xc1e68efb00)
[2021-05-21T01:26:57.370Z] 	/home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:157 +0x36e
[2021-05-21T01:26:57.371Z] created by github.com/pingcap/tidb/ddl.(*ddl).Start
[2021-05-21T01:26:57.371Z] 	/home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/ddl.go:364 +0x6bb
[2021-05-21T01:26:57.371Z] 
[2021-05-21T01:26:57.371Z] prepare_test.go:1022:
[2021-05-21T01:26:57.371Z]     ...
[2021-05-21T01:26:57.371Z] /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/util/testleak/leaktest.go:168:
[2021-05-21T01:26:57.371Z]     c.Errorf("Test %s check-count %d appears to have leaked: %v", c.TestName(), cnt, g)
[2021-05-21T01:26:57.371Z] ... Error: Test testPrepareSerialSuite.TestPrepareCacheWithJoinTable check-count 50 appears to have leaked: github.com/pingcap/tidb/ddl.(*worker).start(0xc1e6e19730, 0xc1e68efb00)
[2021-05-21T01:26:57.371Z] 	/home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:157 +0x36e
[2021-05-21T01:26:57.371Z] created by github.com/pingcap/tidb/ddl.(*ddl).Start
[2021-05-21T01:26:57.371Z] 	/home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/ddl.go:364 +0x6bb
[2021-05-21T01:26:57.371Z] 

@youjiali1995
Copy link
Contributor Author

/run-unit-test

@ti-chi-bot ti-chi-bot merged commit 55d26c5 into pingcap:master May 21, 2021
Howie59 pushed a commit to Howie59/tidb that referenced this pull request May 21, 2021
…ingcap#24052)

* *: fix revoke statement for CURRENT_USER() and refine error message

planner: support set tidb_allow_mpp to `2` or `ENFORCE` to enforce use mpp mode. (pingcap#24516)

store/tikv: remove use of SchemaAmender option in store/tikv (pingcap#24408)

*: the value of tikv-client.store-liveness-timeout should not less than 0 (pingcap#24244)

store/tikv: remove use of EnableAsyncCommit option in store/tikv (pingcap#24462)

txn: Add txn state's view (pingcap#22908)

planner: ignore lock for temporary table of PointGet and BatchPointGet (pingcap#24540)

store/tikv: remove use of ReplicaRead transaction option in store/tikv (pingcap#24409)

store/driver: move error to single package (pingcap#24549)

ddl: add check table compatibility for temporary table (pingcap#24501)

store/tikv: remove use of IsStatenessReadOnly option in store/tikv (pingcap#24464)

store/tikv: change backoff type for missed tiflash peer. (pingcap#24577)

store/tikv: remove use of MatchStoreLabels transaction option in store/tikv (pingcap#24465)

executor, meta: Allocate auto id for global temporary tables (pingcap#24506)

store/tikv: remove use of SampleStep option in store/tikv (pingcap#24461)

executor: add partition pruning tests for adding and dropping partition operations (pingcap#24573)

ddl: forbid partition on temporary mode before put into queue (pingcap#24565)

ddl: speedup test case TestIndexOnMultipleGeneratedColumn (pingcap#24487)

execution: Fix issue 24439 Inconsistent error with MySQL for GRANT CREATE USER ON <specific db>.* (pingcap#24485)

*: fix errcheck (pingcap#24463)

test: make TestExtractStartTs stable (pingcap#24585)

ddl: forbid recover/flashback temporary tables (pingcap#24518)

executor: fix point_get result on clustered index when new-row-format disabled but new-collation enabled (pingcap#24544)

executor: Improve the performance of appending not fixed columns (pingcap#20969)

*: typo fix (pingcap#24564)

planner/core: refresh stale regions in cache for batch cop response (pingcap#24457)

binlog: DML on temporary tables do not write binlog (pingcap#24570)

store/tikv: remove use of GuaranteeLinearizability option in store/tikv (pingcap#24605)

store/tikv: remove use of CollectRuntimeStats option in store/tikv (pingcap#24604)

store/tikv: move Backoffer into a single package (pingcap#24525)

variables: init cte max recursive deeps in a new session (pingcap#24609)

store/tikv: move transaction options out to /kv (pingcap#24619)

store/driver: move backoff driver into single package so we can use i… (pingcap#24624)

server: close the temporary session in HTTP API to avoid memory leak (pingcap#24339)

store/tikv: use latest PD TS plus one as min commit ts (pingcap#24579)

planner: fix incorrect TableDual plan built from nulleq (pingcap#24596)

ranger: fix the case which could have duplicate ranges (pingcap#24590)

 executor, store: Pass the SQL digest down to pessimistic lock request (pingcap#24380)

kv: remove UnionStore interface (pingcap#24625)

*: enable gosimple linter (pingcap#24617)

txn: avoid the gc resolving pessimistic locks of ongoing transactions (pingcap#24601)

util: fix wrong enum building for index range  (pingcap#24632)

sessionctx: change innodb large prefix default (pingcap#24555)

store: fix data race about KVStore.tikvClient (pingcap#24655)

executor, privileges: Add dynamic privileges to SHOW PRIVILEGES (pingcap#24646)

ddl: refactor rule [4/6] (pingcap#24007)

cmd: ddl_test modify retryCnt from 5 to 20 (pingcap#24662)

executor: add correctness tests about direct reading with ORDER BY and LIMIT (pingcap#24455)

store/tikv: remove options from unionstore (pingcap#24629)

planner: fix wrongly check for update statement (pingcap#24614)

store/tikv: remove CompareTS (pingcap#24657)

planner, privilege: Add security enhanced mode part 4 (pingcap#24416)

executor: add some test cases about partition table dynamic-mode with split-region (pingcap#24665)

planner: fix wrong column offsets when processing dynamic pruning for IndexJoin (pingcap#24659)

*: Add security enhanced mode part 3 (pingcap#24412)

store/tikv: resolve ReplicaReadType dependencies (pingcap#24653)

executor: add test cases about partition table with `expression` (pingcap#24628)

tablecodec: fix write wrong prefix index value when collation is ascii_bin/latin1_bin (pingcap#24578)

*: compatibility with staleread (pingcap#24285)

session: test that temporary tables will also be retried (pingcap#24505)

domain, session: Add new sysvarcache to replace global values cache (pingcap#24359)

ddl, transaction: DDL on temporary tables won't affect transactions (pingcap#24534)

*: implement tidb_bounded_staleness built-in function (pingcap#24328)

executor: add correctness tests for partition table with different joins (pingcap#24673)

expression: fix the spelling of word arithmetical (pingcap#24713)

store/copr: balance region for batch cop task (pingcap#24521)

store, metrics: Add metrics for safetTS updating (pingcap#24687)

sem: add tidbredact log to restricted variables (pingcap#24701)

session: fix dml_batch_size doesn't load the global variable (pingcap#24710)

store/tikv: retry TSO RPC (pingcap#24682)

expression, planner: push cast down to control function with enum type. (pingcap#24542)

executor: add correctness tests about IndexMerge (pingcap#24674)

variable: change default for DefDMLBatchSize tidbOptInt64 call (pingcap#24697)

planner: add partitioning pruning tests for range partitioning (pingcap#24554)

*: add option for enum push down (pingcap#24685)

txn: break dependency from store/tikv to tidb/kv cause by TransactionOption (pingcap#24656)

executor: enhancement for ListInDisk(support writing after reading) (pingcap#24379)

kv: move TxnScope into kv (pingcap#24715)

execution: fix the incorrect use of cached plan for point get (pingcap#24749)

executor: add correctness tests about direct reading with indexJoin (pingcap#24497)

variable:  fix sysvar datarace with deep copy (pingcap#24732)

*: Implementing RENAME USER (pingcap#24413)

infoschema, executor: Add the deadlock table (pingcap#24524)

docs: Some proposal for renaming and configurations for Lock View (pingcap#24718)

planner: add range partition boundaries tests with BETWEEN expression (pingcap#24598)

oracle: simplify timestamp utilities (pingcap#24688)

executor: fix wrong enum key in point get (pingcap#24618)

ranger: fix incorrect enum range for xxx_ci collation (pingcap#24661)

executor: add some test cases about dynamic-mode and apply operator (pingcap#24683)

store/tikv: remove Variables.Hook (pingcap#24758)

ddl: speed up the execution time of `TestBackwardCompatibility`. (pingcap#24704)

*: prepare errors for CTE (pingcap#24763)

expression: support cast real/int as real (pingcap#24670)

executor: add table name in log (pingcap#24666)

expression: add builtin function ``json_pretty`` (pingcap#24675)

ddl: make `TestDropLastVisibleColumns` stable (pingcap#24790)

* ddl: make `TestDropLastVisibleColumns` stable

*: support AS OF TIMESTAMP read-only begin statement (pingcap#24740)

executor: Fix unstable TestTiDBLastTxnInfoCommitMode (pingcap#24779)

planner: add tests for partition range boundaries for LT/GT (pingcap#24574)

test: record random seed in TestIssue20658 (pingcap#24782)

store/tikv/retry: define Config instead of BackoffType (pingcap#24692)

config: ignore tiflash when show config (pingcap#24770)

privileges: improve dynamic privs registration and tests (pingcap#24773)

README: remove the link of TiDB Monthly Update (pingcap#24791)

region_cache: filter peers on tombstone or dropped stores (pingcap#24726)

util/stmtsummary: remove import package tikv (pingcap#24776)

ddl: grammar check for create unsupported temporary table (pingcap#24723)

*: update go.etcd.io/bbolt (pingcap#24799)

ddl: speed up the execution time of `ddl test` and `Test Chunk pingcap#7 ddl-other` (pingcap#24780)

executor: remove the unnecessary use of fmt.Sprintf (pingcap#24815)

executor: fix index join panic on prefix index on some cases (pingcap#24568)
@HunDunDM
Copy link
Contributor

Will it cherry-pick to release-5.0?

@youjiali1995
Copy link
Contributor Author

/run-cherry-picker

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jun 30, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #25836

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jun 30, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-5.0 in PR #25838

youjiali1995 added a commit to youjiali1995/tidb that referenced this pull request Jun 30, 2021
)

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>
youjiali1995 added a commit to youjiali1995/tidb that referenced this pull request Jun 30, 2021
)

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.0 sig/transaction SIG:Transaction size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiDB report store-not-found error
7 participants