Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB wrongly uses PD client's GetStore which may result in dead loops #23676

Closed
youjiali1995 opened this issue Mar 30, 2021 · 5 comments · Fixed by #23695
Closed

TiDB wrongly uses PD client's GetStore which may result in dead loops #23676

youjiali1995 opened this issue Mar 30, 2021 · 5 comments · Fixed by #23695
Assignees
Labels
severity/minor sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@youjiali1995
Copy link
Contributor

Bug Report

When TiDB meets the StoreNotMatch error, it will resolve the store through PD client's GetStore():

  • If the store is deleted or a tombstone, regions on this store are all invalidated.

However, TiDB wrongly uses the PD client's GetStore():

store, err := c.pdClient.GetStore(context.Background(), s.storeID)
if err != nil {
metrics.RegionCacheCounterWithGetStoreError.Inc()
} else {
metrics.RegionCacheCounterWithGetStoreOK.Inc()
}
if err != nil {
logutil.BgLogger().Error("loadStore from PD failed", zap.Uint64("id", s.storeID), zap.Error(err))
// we cannot do backoff in reResolve loop but try check other store and wait tick.
return false, err
}
if store == nil || store.State == metapb.StoreState_Tombstone {
// store has be removed in PD, we should invalidate all regions using those store.
logutil.BgLogger().Info("invalidate regions in removed store",
zap.Uint64("store", s.storeID), zap.String("add", s.addr))
atomic.AddUint32(&s.epoch, 1)
metrics.RegionCacheCounterWithInvalidateStoreRegionsOK.Inc()
return false, nil
}

TiDB assumes it returns nil if the store is not found. In fact, if the store is a tombstone, it returns nil; if the store is not found, it returns an error. https://github.com/tikv/pd/blob/bc63de897afa69cfcc5828f45431fe3999fed4b3/client/client.go#L1159-L1168

func handleStoreResponse(resp *pdpb.GetStoreResponse) (*metapb.Store, error) {
	store := resp.GetStore()
	if store == nil {
		return nil, errors.New("[pd] store field in rpc response not set")
	}
	if store.GetState() == metapb.StoreState_Tombstone {
		return nil, nil
	}
	return store, nil
}

So #22907 is not resolved.

@youjiali1995 youjiali1995 added type/bug The issue is confirmed as a bug. sig/transaction SIG:Transaction labels Mar 30, 2021
@youjiali1995
Copy link
Contributor Author

/assign @longfangsong

@ti-chi-bot
Copy link
Member

@youjiali1995: GitHub didn't allow me to assign the following users: longfangsong.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @longfangsong

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longfangsong
Copy link
Contributor

/assign

@youjiali1995
Copy link
Contributor Author

BTW, the store.State isn't changed if it becomes a tombstone:

if store == nil || store.State == metapb.StoreState_Tombstone {
// store has be removed in PD, we should invalidate all regions using those store.
logutil.BgLogger().Info("invalidate regions in removed store",
zap.Uint64("store", s.storeID), zap.String("add", s.addr))
atomic.AddUint32(&s.epoch, 1)
metrics.RegionCacheCounterWithInvalidateStoreRegionsOK.Inc()
return false, nil
}

The store will be resolved forever now...

@ti-srebot
Copy link
Contributor

ti-srebot commented Mar 31, 2021

Please edit this comment or add a new comment to complete the following information

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

Wrongly use the PD client.

2. Symptom (optional)

None. TiDB runs normally.

3. All Trigger Conditions (optional)

None

4. Workaround (optional)

None

5. Affected versions

master

6. Fixed versions

master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/minor sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants