-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tile38 is not publishing its db stats to redis sentinel which causes sentinel to pick wrong master #750
Comments
Digging into this a bit more I would suggest adding the following fields to the
The big one is the last one. See this code block as to why.
I'm tempted that this should just be the AOF offset, but i'm wondering if there is a better stat that can be used. |
The AOF offset is probably the stat to use. |
@Kilowhisky Which version of redis sentinel and tile38 are you using? |
Describe the bug
Tile38 supports nearly all the commands necessary in order to operate as a Redis node with Redis Sentinel based system. One thing it apparently is missing is reporting its DB status to sentinel so that way sentinel can pick the most up to date server as master when a failover occurs.
We've been dealing with an issue where:
Anyways... Back to the issue..
Redis Sentinel uses the following commands to interact with client instances.
In particular it uses the
INFO
command in order to determine the current offset status of the server.link to parser is here: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2490
It is looking for the following fields to be present.
run_id
: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2510slave(N)
(ip, port) https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2530master_link_down_since_seconds
role
https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2576master_link_status
: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2602slave_priority
:https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2610slave_repl_offset
: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2614replica_announced
: https://github.com/redis/redis/blob/3fcddfb61f903d7112da186cba8b1c93a99dc87f/src/sentinel.c#L2618In particular without the
slave_repl_offset
sentinel has no idea the status of the connected nodes. So when a new node of equal priority comes in and a failover happens. It can accidentally pick a empty node.Expected behavior
The fix is to add the
slave_repl_offset
to theINFO [replication]
command so that the sentinel knows who has data and who does not.There also appear to be quite a few other pieces of data here that might be useful that are not sent. I'm still digging into what each field is used for and if we need to surface it.
Logs
This is an example of what tile38 outputs for
INFO replication
This is what a regular redis server outputs: https://stackoverflow.com/questions/40726175/what-do-master-and-slave-offsets-mean-in-redis
Operating System (please complete the following information):
Additional context
We are running tile38 inside EKS utilizing the Bitnami-Redis helm chart. This chart utilizes Sentinel in order to maintain the health of the cluster and properly failover when nodes are killed.
The text was updated successfully, but these errors were encountered: