Skip to content
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.

join retry strategy is incomplete/broken - will simply quit at a certain point #136

Open
benfleis opened this issue Aug 6, 2015 · 3 comments

Comments

@benfleis
Copy link
Contributor

benfleis commented Aug 6, 2015

This came out of a testing/design discussion, and is a placeholder for @jwolski to fill in the details.

@jwolski
Copy link
Contributor

jwolski commented Aug 6, 2015

Hi,

The problem is that Ringpop has an upper-bound to the number of attempts / duration of the join process. By default, Ringpop will make 50 join attempts (with a 100ms delay in between each attempt) or attempt to join for 120s, before giving up and never trying again. Usually the the max join attempts are exceeded first. This is detrimental for a couple of reasons:

  • Sometimes there is an intentional and significant disruption to the ring, e.g. when performing a major version upgrade. We want Ringpop to try for much longer than it does.
  • When Ringpop is in the "I give up" state, others can contact it and it will process a join request, but it will never start the gossip protocol itself.

I think a better way to handle this is a join process that is initially aggressive, but backoffs with increasing delays in between each attempt. It can even use receiving a join request from another node as a trigger to perform a join itself.

It's worth noting that both max join duration and max join attempts are configurable.

@mtfranchetto
Copy link

Any update on this? Is the project not maintained anymore?

@Raynos
Copy link
Contributor

Raynos commented Dec 20, 2017

Ringpop is in maintenance mode. We are still running business critical services on it.

No active development is ongoing on ringpop.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants