-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster slots not rediscovered on scale-down cluster #2806
Comments
Adding to this a scenario I've found: Logging all masters and replicas every second or so (while trying to access a certain key) with:
Started my program while 7101 was down - it did not appear on the list (I then restarted 7101 but the list was not updated) - seems like this is caused by the issue above
At some point I shut down 7105 which caused an error event and the reconnect strategy to trigger
After reconnecting 7105, now 7101 suddenly appears (it became a master following 7105 shutting down) but 7105 does NOT appear as a replica - It this scenario, a MOVED error would NOT occur and the list of replicas would therefore not be updated.
Then, I followed the rest of the guide and added 7106 and 7107 to the cluster and while running
I then restarted the application but the list is still as above with the duplication (probably because 7106 and 7107 were not listed on the list of In the event of having just replicas and not masters added or removed (scale up OR down), no MOVED error would probably occur, leading to the issue above. I think a periodic interval refresh/rediscover as suggested above is a good idea. |
IMO a better solution would be to run "rediscover" proactively when a connection dies rather than having a "slots refresh interval" (which will solve the problem eventually, but will still have some "downtime"). |
Description
#slots.rediscover()
is correctly called whenMOVED
orASK
reply is received:node-redis/packages/client/lib/cluster/index.ts
Line 271 in 6f79b49
node-redis/packages/client/lib/cluster/index.ts
Line 256 in 6f79b49
there is a problem / race-condition on scale-down cluster (rebalance hash slots away, forget node, shutdown):
#slots.rediscover()
might never be called, if the leaving node has not been queried (after it's slots migrated away) before shutdown (hence it never got a chance to reply with MOVED and trigger a rediscover)this results in
connection closed
errors, because node-rediscluster.#slots
is keeping a client active for an outdated cluster topology.steps to reproduce:
connection closed
erroran option to fix this would be to provide an option for a slot refresh interval (ioredis does this too), will send a pull request
Node.js Version
No response
Redis Server Version
No response
Node Redis Version
No response
Platform
No response
Logs
No response
The text was updated successfully, but these errors were encountered: