You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.
The problem is that Ringpop has an upper-bound to the number of attempts / duration of the join process. By default, Ringpop will make 50 join attempts (with a 100ms delay in between each attempt) or attempt to join for 120s, before giving up and never trying again. Usually the the max join attempts are exceeded first. This is detrimental for a couple of reasons:
Sometimes there is an intentional and significant disruption to the ring, e.g. when performing a major version upgrade. We want Ringpop to try for much longer than it does.
When Ringpop is in the "I give up" state, others can contact it and it will process a join request, but it will never start the gossip protocol itself.
I think a better way to handle this is a join process that is initially aggressive, but backoffs with increasing delays in between each attempt. It can even use receiving a join request from another node as a trigger to perform a join itself.
It's worth noting that both max join duration and max join attempts are configurable.
This came out of a testing/design discussion, and is a placeholder for @jwolski to fill in the details.
The text was updated successfully, but these errors were encountered: