join retry strategy is incomplete/broken - will simply quit at a certain point #136

benfleis · 2015-08-06T13:23:10Z

This came out of a testing/design discussion, and is a placeholder for @jwolski to fill in the details.

jwolski · 2015-08-06T13:38:59Z

Hi,

The problem is that Ringpop has an upper-bound to the number of attempts / duration of the join process. By default, Ringpop will make 50 join attempts (with a 100ms delay in between each attempt) or attempt to join for 120s, before giving up and never trying again. Usually the the max join attempts are exceeded first. This is detrimental for a couple of reasons:

Sometimes there is an intentional and significant disruption to the ring, e.g. when performing a major version upgrade. We want Ringpop to try for much longer than it does.
When Ringpop is in the "I give up" state, others can contact it and it will process a join request, but it will never start the gossip protocol itself.

I think a better way to handle this is a join process that is initially aggressive, but backoffs with increasing delays in between each attempt. It can even use receiving a join request from another node as a trigger to perform a join itself.

It's worth noting that both max join duration and max join attempts are configurable.

mtfranchetto · 2017-12-05T16:49:20Z

Any update on this? Is the project not maintained anymore?

Raynos · 2017-12-20T00:38:33Z

Ringpop is in maintenance mode. We are still running business critical services on it.

No active development is ongoing on ringpop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

join retry strategy is incomplete/broken - will simply quit at a certain point #136

join retry strategy is incomplete/broken - will simply quit at a certain point #136

benfleis commented Aug 6, 2015

jwolski commented Aug 6, 2015

mtfranchetto commented Dec 5, 2017

Raynos commented Dec 20, 2017

join retry strategy is incomplete/broken - will simply quit at a certain point #136

join retry strategy is incomplete/broken - will simply quit at a certain point #136

Comments

benfleis commented Aug 6, 2015

jwolski commented Aug 6, 2015

mtfranchetto commented Dec 5, 2017

Raynos commented Dec 20, 2017