-
Can I reconnect to the node used when I catch a RedisCommandTimeoutException for a command to a Redis Cluster? We are having a problem where the old master does not respond for 10 seconds after a FAILOVER is issued to its replica. TCP packets with new requests still get acked during these 10 seconds. As the connection is clearly not dead, Lettuce keeps sending new commands to the old master. Eventually, it will receive all the MOVED response at once but this is too late for us. For our specific problem, it would be better if Lettuce reconnected to the node on command timeout as the bug only seems to affect a single TCP socket. A command on a new socket will get an immediate MOVED response, allowing Lettuce to continue on the master. I guess it could be tricky to get this right as all the requests in flight will time out at different times and we probably do not want to reconnect for each timeout. Of course, we are trying to get the underling problem with Redis resolved too, see #2572 but a work-around like this would still be useful until that gets fixed. I have checked the wiki, GitHub issues and GitHub Discussions and found #2082 which is similar but in that case, the TCP packets do not get acked, leading to another solution. I tried setting an absurdly low periodic refresh of a few hundred milliseconds but that does not seem to help, which might be a bug but I have not looked into it yet. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
Hey @e-ts , this is a tricky question. In your scenario you know that a command would time out because of the Redis instance delaying its responses due to failover, but this is a very specific failover scenario. In practice a command could time out due to many different reasons (network delay, server load, etc.) and in many of those cases the correct approach would be to resend the command to the same instance without reconnecting to the same server. Reconnect is a slow process and if we reconnect on each timeout we might drastically decrease the performance of the driver in many of those cases. You mentioned #2082, did you manage to check out the option to set a custom TCP_USER_TIMEOUT with #2499? RedisClient redisClient = RedisClient.create(RedisURI.Builder
.redis("redis.io", 12000)
.build());
SocketOptions socketOptions = SocketOptions.builder()
.tcpUserTimeout(SocketOptions.TcpUserTimeoutOptions.builder()
.enable(true)
.tcpUserTimeout(Duration.ofSeconds(3))
.build())
.keepAlive(SocketOptions.KeepAliveOptions.builder()
.interval(Duration.ofSeconds(5))
.idle(Duration.ofSeconds(5))
.count(3).enable()
.build())
.build();
redisClient.setOptions(ClientOptions.builder().socketOptions(socketOptions).build());
RedisCommands<String, String> redis = redisClient.connect(new StringCodec()).sync(); If you do go that way have in mind that - as with my previous note - this timeout might be caused by different (valid) scenarios, that do not require re-connect. You should be mindful of the value you set. But I think, from your description, it should resolve your issue if your deployment is such that the server always responds within a couple of seconds. Also have in mind that it is generally a good idea to set up a KEEPALIVE setting too. All values there are highly dependent on your application and deployment. |
Beta Was this translation helpful? Give feedback.
My bad, missed the fact the packages are being acknowledged. You are right,
TCP_USER_TIMEOUT
is useless in this case.