Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync store may block the process of leader exit #6918

Closed
2 of 3 tasks
rleungx opened this issue Aug 9, 2023 · 0 comments · Fixed by #6919
Closed
2 of 3 tasks

Sync store may block the process of leader exit #6918

rleungx opened this issue Aug 9, 2023 · 0 comments · Fixed by #6919
Assignees

Comments

@rleungx
Copy link
Member

rleungx commented Aug 9, 2023

Bug Report

What did you do?

Transfer PD leader

What did you expect to see?

It should be finished in a short time.

What did you see instead?

Cost about 3 min to finish

img_v2_e647dfae-9587-4d8b-b046-1d1093b3f2eg

It will try to connect to the tombstone store and doesn't have any timeout mechanism. And the store list is only updated when fail to request all stores. So If we offline a store, even if we do remove-tombstone, it could still be on the list.

// runSyncConfig runs the job to sync tikv config.
func (c *RaftCluster) runSyncConfig() {
	defer logutil.LogPanic()
	defer c.wg.Done()

	ticker := time.NewTicker(time.Minute)
	defer ticker.Stop()
	stores := c.GetStores()

	syncConfig(c.storeConfigManager, stores)
	for {
		select {
		case <-c.ctx.Done():
			log.Info("sync store config job is stopped")
			return
		case <-ticker.C:
			if !syncConfig(c.storeConfigManager, stores) {
				stores = c.GetStores() // update until all stores fail 
			}
		}
	}
}
func syncConfig(manager *config.StoreConfigManager, stores []*core.StoreInfo) bool {
	for index := 0; index < len(stores); index++ {
		// filter out the stores that are tiflash
		store := stores[index]
		if core.IsStoreContainLabel(store.GetMeta(), core.EngineKey, core.EngineTiFlash) {
			continue
		}

		// filter out the stores that are not up.
		if !(store.IsPreparing() || store.IsServing()) {
			continue
		}
		// it will try next store if the current store is failed.
		address := netutil.ResolveLoopBackAddr(stores[index].GetStatusAddress(), stores[index].GetAddress())
		if err := manager.ObserveConfig(address); err != nil {
			storeSyncConfigEvent.WithLabelValues(address, "fail").Inc()
			log.Debug("sync store config failed, it will try next store", zap.Error(err))
			continue
		}
		storeSyncConfigEvent.WithLabelValues(address, "succ").Inc()
		// it will only try one store.
		return true
	}
	return false
}
// GetConfig returns the store config from TiKV.
func (s TiKVConfigSource) GetConfig(statusAddress string) (*StoreConfig, error) {
	url := fmt.Sprintf("%s://%s/config", s.schema, statusAddress)
	resp, err := s.client.Get(url)
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()
	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return nil, err
	}
	var cfg StoreConfig
	if err := json.Unmarshal(body, &cfg); err != nil {
		return nil, err
	}
	return &cfg, nil
}

What version of PD are you using (pd-server -V)?

v6.1.0

summary:

fixed in :

@rleungx rleungx added the type/bug The issue is confirmed as a bug. label Aug 9, 2023
ti-chi-bot bot added a commit that referenced this issue Aug 10, 2023
close #6918

add timeout context for observer tikv config to avoid wait too long

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Aug 10, 2023
close tikv#6918

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Aug 10, 2023
close tikv#6918

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Aug 10, 2023
close tikv#6918

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue Aug 11, 2023
close #6918

add timeout context for observer tikv config to avoid wait too long

Signed-off-by: bufferflies <1045931706@qq.com>
ti-chi-bot bot pushed a commit that referenced this issue Aug 14, 2023
close #6918

add timeout context for observer tikv config to avoid wait too long

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: bufferflies <1045931706@qq.com>
Co-authored-by: buffer <1045931706@qq.com>
ti-chi-bot bot pushed a commit that referenced this issue Aug 29, 2023
close #6918

add timeout context for observer tikv config to avoid wait too long

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: buffer <1045931706@qq.com>
Co-authored-by: bufferflies <1045931706@qq.com>
rleungx pushed a commit to rleungx/pd that referenced this issue Dec 1, 2023
close tikv#6918

add timeout context for observer tikv config to avoid wait too long

Signed-off-by: bufferflies <1045931706@qq.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot added a commit that referenced this issue Sep 14, 2024
close #6918

add timeout context for observer tikv config to avoid wait too long

Co-authored-by: Shirly <AndreMouche@126.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants