Is generation in `clusterNodes.GC()` too old? #3284

DingYuan0118 · 2025-02-19T09:05:30Z

Discussed in #3283

^{Originally posted by DingYuan0118 February 19, 2025}

version: github.com/go-redis/redis/v8 v8.11.5

Since redis cluster client start to reload state every 10 seconds (Assume traffic always exists) likes below

func (c *clusterStateHolder) Get(ctx context.Context) (*clusterState, error) {
	v := c.state.Load()
	if v == nil {
		return c.Reload(ctx)
	}

	state := v.(*clusterState)
	if time.Since(state.createdAt) > 10*time.Second {
		c.LazyReload()
	}
	return state, nil
}

and do GC one minute after the Reload() like:

func newClusterState(
	nodes *clusterNodes, slots []ClusterSlot, origin string,
) (*clusterState, error) {
	c := clusterState{
		nodes: nodes,

		slots: make([]*clusterSlot, 0, len(slots)),

		generation: nodes.NextGeneration(),
		createdAt:  time.Now(),
	}

	originHost, _, _ := net.SplitHostPort(origin)
	isLoopbackOrigin := isLoopback(originHost)

	for _, slot := range slots {
              ......
	}
	sort.Sort(clusterSlotSlice(c.slots))

	time.AfterFunc(time.Minute, func() {
		nodes.GC(c.generation)
	})

	return &c, nil
}

When it starts to do the GC, the generation it uses is from 1 minute ago. This will result in a gap between clusterNode.generation and GC generation and causes activeAddrs are not updated in time. I add some support log to verify this:

// GC removes unused nodes.
func (c *clusterNodes) GC(generation uint32) {
	//nolint:prealloc
	var collected []*clusterNode

	c.mu.Lock()
	fmt.Printf("time %s: GC, generation: %d\n", time.Now().Format("2006-01-02 15:04:05"), generation)  // logs
	fmt.Printf("time %s: GC, current nodes: %d\n", time.Now().Format("2006-01-02 15:04:05"), len(c.nodes)) // logs

	c.activeAddrs = c.activeAddrs[:0]
	for addr, node := range c.nodes {
		fmt.Printf("time %s: GC, node addr: %s, generation: %d\n", time.Now().Format("2006-01-02 15:04:05"), addr, node.Generation())
		if node.Generation() >= generation {
			c.activeAddrs = append(c.activeAddrs, addr)
			if c.opt.RouteByLatency {
				go node.updateLatency()
			}
			continue
		}

		delete(c.nodes, addr)
		collected = append(collected, node)
	}
	fmt.Printf("time %s: GC, current activeAddrs: %v\n", time.Now().Format("2006-01-02 15:04:05"), c.activeAddrs) // logs
	c.mu.Unlock()

	for _, node := range collected {
		_ = node.Client.Close()
	}
}

output like

time 2025-02-19 09:01:44: GC, generation: 1
time 2025-02-19 09:01:44: GC, current nodes: 7
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.12:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.16:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.11:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.15:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.13:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: 172.19.0.14:6379, generation: 6
time 2025-02-19 09:01:44: GC, node addr: redis.example.com:6379, generation: 0
time 2025-02-19 09:01:44: GC, current activeAddrs: [172.19.0.12:6379 172.19.0.16:6379 172.19.0.11:6379 172.19.0.15:6379 172.19.0.13:6379 172.19.0.14:6379]

We can find that nodes generation(which is 6) is greater than GC generation(which is 1).

If I wanna switch a redis cluster by operating the DNS record and shutting down the old cluster, it will take one minute to make the activeAddrs be the new cluster IPs. If the old cluster reboot in this one minute, it will causes the real connection drift between the two clusters.

I wanna check is it an expected action or to prevent some problems?

The text was updated successfully, but these errors were encountered:

ndyakov · 2025-02-27T17:05:14Z

Hello @DingYuan0118, thank you for reaching out. We are officially supporting v9+ at the moment. If I have the bandwidth I will review this issue in the following weeks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is generation in `clusterNodes.GC()` too old? #3284

Is generation in `clusterNodes.GC()` too old? #3284

DingYuan0118 commented Feb 19, 2025 •

edited

Loading

ndyakov commented Feb 27, 2025

Is generation in clusterNodes.GC() too old? #3284

Is generation in clusterNodes.GC() too old? #3284

Comments

DingYuan0118 commented Feb 19, 2025 • edited Loading

Discussed in #3283

ndyakov commented Feb 27, 2025

Is generation in `clusterNodes.GC()` too old? #3284

Is generation in `clusterNodes.GC()` too old? #3284

DingYuan0118 commented Feb 19, 2025 •

edited

Loading