check_if_node_is_quorum_critical with a noproc follower #10423
Unanswered
gomoripeti
asked this question in
Other
Replies: 1 comment 2 replies
-
the Ra process should delete its record from the |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Given a 3-node cluster with a 3-member quorum queue. One of the followers is dead (eg. crashed and reach max intensity). In this case
check_if_node_is_quorum_critical
does not report any queue with minimum quorum for any of the 3 nodes.However shutting down the node hosting the leader (or delete_member for the leader node) will leave the queue with 1 running follower and 1 dead follower. It wont have quorum and wont be operational (cluster_change_not_permitted).
In case the whole node 01 is not available, then the queue qq1 is reported.
It would be helpful if also in the dead follower process case the command would report the same.
I wonder if this is intentional? Or is this a necessary optimisation as it would be very expensive to check each queue member process on each node in case of a lot of queues. (I understand the reason is because the check only looks at the
ra_state
table and does not contact any process to fetch raft state.) Would #9518 or #10394 improve on this situation?My question is similar to #9518 but the opposite. There the "wrong" member should not be reported as critical while in my case the "working" member should be.
My use case would be to do a rolling restart and check before each node shutdown if it is ok to continue without loosing availability of any queue. (Another use case is to do a rolling replace node which is roughly: shrink QQs, delete rabbitmq cluster node, add new cluster node, grow QQs)
Beta Was this translation helpful? Give feedback.
All reactions