Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comma separated list of values in the alpha's --zero option #4949

Closed
sleto-it opened this issue Mar 17, 2020 · 1 comment · Fixed by #5116
Closed

Allow comma separated list of values in the alpha's --zero option #4949

sleto-it opened this issue Mar 17, 2020 · 1 comment · Fixed by #5116
Assignees
Labels
area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.

Comments

@sleto-it
Copy link
Contributor

Experience Report

Note: Feature requests are judged based on user experience and modeled on Go Experience Reports. These reports should focus on the problems: they should not focus on and need not propose solutions.

What you wanted to do

When starting a cluster first the zero group has to be started
Let's suppose we have 3 zeros, zero1, zero2 and zero3, and that they are now up and running

We now begin to start the alphas, let's suppose we have 3 alphas, alpha1, alpha2 and alpha3

When starting an alpha we use the --zero option to pass the IP:port of an healthy zero instance, so this alpha can communicate with the zero group we started previously, and join the cluster

Let's suppose we pass zero1 to alpha1, alpha2 and alpha3

Now the cluster is up and running

Let's suppose we are using systemd

Now let's suppose zero1 goes down, and it stays down for some time
Let's suppose we try to restart an alpha ----> it will not start as zero1 is down

What you actually did

To have this alpha join the cluster again, we have to:

  • either start zero1 (but for some reasons we cannot here and now)
  • edit the systemd configuration to change --zero from zero1 to zero2 or zero3

Why that wasn't great, with examples

We would like to avoid having to change the systemd configuration to handle this situation. An elegant solution here is to allow --zero to accept a comma separated list of zero:IP. This way, when alpha connects, it tries to connect to the first, it for any reason it is down, it will try to connect to the second, and so on, and no changes of systemd files will be needed in such case

Any external references to support your case

This is in line with what other HA systems allow/provide

Thanks,

@sleto-it sleto-it added kind/enhancement Something could be better. status/accepted We accept to investigate/work on it. area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. labels Mar 17, 2020
@MichelDiz
Copy link
Contributor

It would be better for Dgraph to inform the Alphas about the presence of the extra Zero instances and those that enter later(including those that have been removed). Improve Dgraph's Heartbeat protocol in this regard.

In fact, Dgraph already does this information distribution in the cluster. If you have 3 Zeros, and for example, the Leader dies. All Alphas will communicate with those who are still up. However, if the Alphas die, they "forget" that there are other instances of Zeros present in the Cluster. (that's the issue here)

So this should be an improvement on Dgraph's heartbeat protocol. Instead of asking the user to pass that information, we can make the cluster be "self-aware" (as in part, it is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

3 participants