You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue tracker is used for reporting bugs and discussing new features. Please use stackoverflow for supporting issues.
in a 3 node cluster, 3 sentinel + 3 redis-server, named: A 、B、C node, Construct C node network card goes offline, eg: ifconfig eth0 down, then the client reconnects to the Redis Sentinel to find the master address with func NewFailoverClient
Expected Behavior
redis-server failover , client can connect new master redis success
Current Behavior
Probability error: context deadline exceeded, when it try to connect C sentinel node, return err in https://github.com/redis/go-redis/blob/master/sentinel.go#L559, although A and B is work normaly, the context is deadline in this time, Because the faulty node C is placed in the first place during random sentinel addresses, C exhausts the context time, resulting in the immediate context timeout of A and B
Possible Solution
In obtaining the master address function, instead of using sequential joins for each sentinel address query you can consider concurrent goroutine queries, or use a separate context for each round of queries
Change the context of each iteration to be independent, use context.deadline to copy context
for i, sentinelAddr := range c.sentinelAddrs {
sentinel := NewSentinelClient(c.opt.sentinelOptions(sentinelAddr))
masterAddr, err := sentinel.GetMasterAddrByName(ctx, c.opt.MasterName).Result()
if err != nil {
_ = sentinel.Close()
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return "", err
}
internal.Logger.Printf(ctx, "sentinel: GetMasterAddrByName master=%q failed: %s",
c.opt.MasterName, err)
continue
}
// Push working sentinel to the top.
c.sentinelAddrs[0], c.sentinelAddrs[i] = c.sentinelAddrs[i], c.sentinelAddrs[0]
c.setSentinel(ctx, sentinel)
addr := net.JoinHostPort(masterAddr[0], masterAddr[1])
return addr, nil
}
Steps to Reproduce
deploy a 3 sentinel + 3 redis server cluster,
make One of the node nics is offline and unreachable, eg ifconfig etho down
The client connect redis cluster repeatedly with func NewFailoverClient
Check whether the primary redis address can be obtained
it seem error : context deadline exceeded,
Context (Environment)
centos8 with kernel: 4.18
go-redis: v9.6.0
ctx timeout: 3s,
dialTimeout: default 5s
Detailed Description
I think the point is,
The first point to get the primary address is, why query each node sequentially, so that the failed node in the front row may affect the healthy node in the back
Second, when repeated initialization, the random function is a pseudo-random, and the random seed is 1, which may lead to multiple rounds of repeated initialization results are the same, and it is always fixed for a certain failure, that is, when the faulty node is randomized to the first place
The text was updated successfully, but these errors were encountered:
kwenzh
changed the title
Sentinel cluster set 1 node network iface down, unable to elect a master, context deadline exceeded
Sentinel cluster settings 1 node network iface down, Probability unable to query the master node, MasterAddr error: context deadline exceeded
Oct 28, 2024
Simulating multiple random sentinel nodes, it can be observed that node C is randomly placed in the first position during the second simulation. Moreover, the results are the same in each round because it is pseudo-random with a seed of 1.
>>>>>>>> 2 1
>>>>>>>> 1 1
>>>>>>>> [A C B]
>>>>>>>> 2 1
>>>>>>>> 1 0
>>>>>>>> [C A B]
>>>>>>>> 2 1
>>>>>>>> 1 1
>>>>>>>> [A C B]
>>>>>>>> 2 0
>>>>>>>> 1 0
>>>>>>>> [B C A]
>>>>>>>> 2 0
>>>>>>>> 1 0
>>>>>>>> [B C A]
>>>>>>>> 2 1
>>>>>>>> 1 1
>>>>>>>> [A C B]
>>>>>>>> 2 0
>>>>>>>> 1 0
>>>>>>>> [B C A]
>>>>>>>> 2 0
>>>>>>>> 1 0
>>>>>>>> [B C A]
>>>>>>>> 2 0
>>>>>>>> 1 0
>>>>>>>> [B C A]
>>>>>>>> 2 2
>>>>>>>> 1 0
>>>>>>>> [B A C]
Simulating multiple initializations of the sentinel, when node C fails, an error will occur in the second round of the loop, causing it to exit due to a context timeout.
Issue tracker is used for reporting bugs and discussing new features. Please use
stackoverflow for supporting issues.
in a 3 node cluster, 3 sentinel + 3 redis-server, named: A 、B、C node, Construct C node network card goes offline, eg:
ifconfig eth0 down
, then the client reconnects to the Redis Sentinel to find the master address with funcNewFailoverClient
Expected Behavior
Current Behavior
context deadline exceeded
, when it try to connect C sentinel node, return err in https://github.com/redis/go-redis/blob/master/sentinel.go#L559, although A and B is work normaly, the context is deadline in this time, Because the faulty node C is placed in the first place during random sentinel addresses, C exhausts the context time, resulting in the immediate context timeout of A and BPossible Solution
Steps to Reproduce
ifconfig etho down
NewFailoverClient
context deadline exceeded,
Context (Environment)
Detailed Description
I think the point is,
The text was updated successfully, but these errors were encountered: