Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiflash epochNotMatch make tiflash as region leader forever #17930

Closed
lysu opened this issue Jun 10, 2020 · 0 comments · Fixed by #18040
Closed

tiflash epochNotMatch make tiflash as region leader forever #17930

lysu opened this issue Jun 10, 2020 · 0 comments · Fixed by #18040
Assignees
Labels
severity/critical type/bug The issue is confirmed as a bug.

Comments

@lysu
Copy link
Contributor

lysu commented Jun 10, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

access tiflash and trigger EpochNotMatch error on it

use leader read by Point-get to access that region

2. What did you expect to see? (Required)

point-get success

3. What did you see instead (Required)

query fail with time out

func (s *testRegionCacheSuite) TestRegionEpochOnTiFlash(c *C) {
	// add store3 as tiflash
	store3 := s.cluster.AllocID()
	peer3 := s.cluster.AllocID()
	s.cluster.AddStore(store3, s.storeAddr(store3))
	s.cluster.UpdateStoreAddr(store3, s.storeAddr(store3), &metapb.StoreLabel{Key: "engine", Value: "tiflash"})
	s.cluster.AddPeer(s.region1, store3, peer3)
	s.cluster.ChangeLeader(s.region1, s.peer1)

	// pre-load region cache
	loc1, err := s.cache.LocateKey(s.bo, []byte("a"))
	c.Assert(err, IsNil)
	c.Assert(loc1.Region.id, Equals, s.region1)
	lctx, err := s.cache.GetTiKVRPCContext(s.bo, loc1.Region, kv.ReplicaReadLeader, 0)
	c.Assert(err, IsNil)
	c.Assert(lctx.Peer.Id, Not(Equals), peer3)

	// epoch-not-match on tiflash
	ctxTiFlash, err := s.cache.GetTiFlashRPCContext(s.bo, loc1.Region)
	c.Assert(err, IsNil)
	r := ctxTiFlash.Meta
	reqSend := NewRegionRequestSender(s.cache, nil)
	regionErr := &errorpb.Error{EpochNotMatch: &errorpb.EpochNotMatch{CurrentRegions: []*metapb.Region{r}}}
	reqSend.onRegionError(s.bo, ctxTiFlash, nil, regionErr)

	// check leader read should not go to tiflash
	lctx, err = s.cache.GetTiKVRPCContext(s.bo, loc1.Region, kv.ReplicaReadLeader, 0)
	c.Assert(err, IsNil)
	c.Assert(lctx.Peer.Id, Not(Equals), peer3)
}

this test case always fail in current tidb.

4. Affected version (Required)

4.0

5. Root Cause Analysis

tidb refill region cache when meet epoch not match error.

but it choose current store as leader, it's true for tikv but make region stuck when error comes from tiflash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
2 participants