Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unexpected slow query during GC running after stop 1 tikv-server #899

Merged
merged 5 commits into from
Jul 24, 2023

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Jul 20, 2023

close #898

Why issue #898 happen?

After stop 1 tikv-server, some region replicas are marked by replica.isEpochStale is true. Then accessFollower won't choose the replica anymore.

But when TiDB GC leader start to running GC, it will reload all region, then all region replicas epoch will be update, which means all region replica's isEpochStale will change to false. Then accessFollower may choose the replica which in down tikv-server. Then TiDB may send kv request to down tikv-server will receive context deadline exceeded error and re-send kv request to the region leader. This is what causes slow queries.

How this fix work?

In short, accessFollower need to check the replica's store LivenessState when chose target replica.

Before This PR:

image

This PR:

image

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520 crazycs520 marked this pull request as ready for review July 20, 2023 11:13
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@you06
Copy link
Contributor

you06 commented Jul 21, 2023

This problem reminds me of the issue from the TiDB forum: https://ask.pingcap.com/t/every-10-minutes-my-in-flight-stale-reads-fail/518

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@cfzjywxk cfzjywxk requested review from zyguan and you06 July 24, 2023 03:22
@cfzjywxk
Copy link
Contributor

@you06 @zyguan
PTAL

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520
Copy link
Contributor Author

/hold since the test failed.

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@MyonKeminta MyonKeminta merged commit 59adec2 into tikv:tidb-6.5 Jul 24, 2023
cfzjywxk pushed a commit that referenced this pull request Jul 26, 2023
…899) (#909)

* fix unexpected slow query during GC running after stop 1 tikv-server

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>
crazycs520 added a commit to crazycs520/client-go that referenced this pull request Aug 7, 2023
iosmanthus added a commit that referenced this pull request Aug 11, 2023
* client-go: add some key range info to error when PD returned no region (#862)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* *: refine non-global stale-read request retry logic (#863)

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* Fix the issue that primary pessimistic lock may be left not cleared after GC (#866)

* Fix the issue that primary pessimistic lock may be left not cleared after GC

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Fix mysteriously shown up thing that makes compilation failed

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Fix test effectiveness (forgot to set txn2 to pessimistic txn); add more strict checks

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Address comments

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

---------

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* add explicit request source type to label the external request like lightning/br (#868)

Signed-off-by: nolouch <nolouch@gmail.com>

* use '%d' instead of '%q' for some int values in error message (#875)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* format key in error message in method `scanRegions` (#876)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* make cop request timeout a config paramter (#865)

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

---------

Signed-off-by: Spade A <u6748471@anu.edu.au>

* region_cache: support check pending tiflash peer (#821)

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Co-authored-by: disksing <i@disksing.com>

* *: add `SnapshotIterReverse` and make `iterReverse` supports `lowerBound` (#883)

Signed-off-by: Jason Mo <mohangjie1995@gmail.com>

* *: fix stale read ops metric (#878) (#889)

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* add gc options (#828)

Signed-off-by: weedge <weege007@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* reload region cache when store is resolved from invalid status (#843)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* ci: update setup-go action (#904)

Signed-off-by: disksing <i@disksing.com>

* fix unexpected slow query during GC running after stop 1 tikv-server (#899) (#909)

* fix unexpected slow query during GC running after stop 1 tikv-server

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* resource_manager: ignore ru metrics for background request (#872)

Signed-off-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: disksing <i@disksing.com>

* add more log for diagnose (#915)

* add more log for diagnose

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* add more log for diagnose

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* add more log

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* address comment

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* use context logger as much as possible (#908)

* use context logger as much as possible

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* refine

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* Resume max retry time check for stale read retry with leader option(#903) (#911)

* Resume max retry time check for stale read retry with leader option

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add cancel

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

---------

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* request_source: remove default label (#890)

* request_source: remove default label

Signed-off-by: nolouch <nolouch@gmail.com>

* add a function to set request source task type (#925)

* add a function to set request source task type

Signed-off-by: glorv <glorvs@163.com>

* ci: update go version (#936)

* ci: update go version

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* use tidb_kv_read_timeout as first kv request timeout (#919)

* support tidb_kv_read_timeout as first round kv request timeout

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* update comment

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* refine test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* [pick] resource_control: bypass some internal urgent request (#938)

* resource_control: bypass some internal urgent request (#884)

Signed-off-by: nolouch <nolouch@gmail.com>

* resourcecontrol: fix nil pointer (#900)

Signed-off-by: nolouch <nolouch@gmail.com>

---------

Signed-off-by: nolouch <nolouch@gmail.com>

---------

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: Spade A <u6748471@anu.edu.au>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: Jason Mo <mohangjie1995@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: disksing <i@disksing.com>
Signed-off-by: husharp <jinhao.hu@pingcap.com>
Signed-off-by: cfzjywxk <lsswxrxr@163.com>
Signed-off-by: glorv <glorvs@163.com>
Signed-off-by: iosmanthus <myosmanthustree@gmail.com>
Co-authored-by: 王超 <cclcwangchao@hotmail.com>
Co-authored-by: crazycs <crazycs520@gmail.com>
Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com>
Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: Spade  A <71589810+SpadeA-Tang@users.noreply.github.com>
Co-authored-by: guo-shaoge <shaoge1994@163.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: Hangjie Mo <mohangjie1995@gmail.com>
Co-authored-by: weedge <weege007@gmail.com>
Co-authored-by: you06 <you1474600@gmail.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: glorv <glorvs@163.com>
cfzjywxk pushed a commit that referenced this pull request Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants