Allow re-join after leave #110

armon · 2014-05-02T21:47:40Z

We should have a flag to allow a node to re-join even though it left.

bscott · 2014-05-02T22:25:36Z

+1

Preserve the cluster server members on stop and use them to join on start. Work-around for hashicorp/consul#110

XavM · 2014-05-03T22:36:44Z

+1

May be better: allow a new configuration key to specify an existing DNS server (and an optional DNS port if you want to use consul as the DNS), fetch the SRV records corresponding to the consul service from this DNS and join this consul cluster

This key could be set to an array of ["primary_dns_ip:port", "secondary_dns_ip:port"]
If not set, consul agent would fallback to the DNS specified in resolv.conf (or whatever for Windows, etc ...)

By default an agent won't join any nodes when it starts up
Specifying "-join" or "start_join" with NO address would force an agent to join upon starting up using consul SRV from DNS
Specifying "-join" or "start_join" WITH an address would force an agent to join upon starting up using this particular address (or set of addresses), bypassing the DNS fetch

This would be backward compatible with consul v0.2.0- and would eliminate the need to maintain a hard-coded list of addresses to join on startup

Credits : confd "-srv-domain" -> https://github.com/kelseyhightower/confd/blob/master/docs/dns-srv-records.md

blalor · 2014-05-04T10:29:31Z

I like that very much. But I don't see how a default SRV domain would work for everyone, unless you expect that search is set in resolv.conf (or equivalent).

armon · 2014-05-04T20:23:37Z

I don't think we will be relying on DNS for this. The initial join will still be required, but we can re-join the cluster on further reboots using the old cached member information. If you want DNS for bootstrapping, that can be composed more elegantly by just using the consul join command with a DNS host name.

XavM · 2014-05-04T20:30:14Z

@blalor: Not sure i get your question right; What i meant :

Upon start up, the consul agent queries the DNS (the one you have specified with the new suggested configuration key, or the one set in resolv.conf) for the SRV record "consul.service.consul", then, the DNS answers the list of available servers (including ips and associated ports) where you can join an existing consul cluster

The DNS you query could be an existing consul cluster, or your main DNS server (bind, dnsmasq, etc ...) that you have previously setup to forward queries to Consul as appropriate (when zone is "consul")

My point was : Consul is designed for service discovery; Why not use consul to discover existing and healthy consul cluster members

PS : of course, the SRV lookup should use the "domain" configuration key when specified, and only fall back to the "consul" default domain when missing

armon · 2014-05-04T20:40:08Z

@XavM I agree, but I think there is no special integration required. If the node is using Consul for DNS already, then "consul join consul.service.consul" should just work!

XavM · 2014-05-04T20:57:06Z

@armon: Thank you for your answer

I think i must have missed your point (sorry about that)

Regarding the consul join command with a DNS host name, you have to know which host is up, is running consul (and on which port) and is member in the desired cluster, meaning you still have to maintain a hard-coded list of hosts:ports to join on startup

A rejoin using old cached member information could fail due to a stale cache (Depending on how long the node has been down, the topology could have change)

Any way, thank you for this promising solution; I am sure you will make the best choice

Edit :
@armon: After I have seen your previous response, I tried to join using "consul join consul.service.consul" and it works brilliantly !! (The doc does not mention it)

blalor · 2014-05-04T21:31:42Z

[Trying to bring threads in two mediums together]

On May 4, 2014, at 4:57 PM, XavM notifications@github.com wrote:

Regarding the consul join command with a DNS host name, you have to know which host is up, is running consul (and on which port) and is member in the desired cluster, meaning you still have to maintain a hard-coded list of hosts:ports to join on startup

This is it exactly. It’s a chicken/egg problem. I think a -srv-domain option would be ideal. “-srv-domain” with no domain could default to the host’s DNS domain name, or one could be provided. The actual record that would be queried would be _consul._tcp.$DOMAIN. Then consul would just join using all of the entries in that record. Consul itself can’t be queried because it hasn’t joined the cluster, yet! I’ve got a work-around for not having -srv-domain; I’m using Chef to run the following:

if ! curl -f -s 'http://localhost:8500/v1/status/leader' &> /dev/null ; then
    dig +search +noall +answer _consul._tcp SRV | awk '{print $NF}' | xargs --no-run-if-empty consul join
fi

/v1/status/leader returns 500 if there is no leader (and therefore the agent’s not part of the cluster). If that query fails, then dig is used to search for the SRV record, and all entries in that record are passed to “consul join"

That just leaves the problem of how to update that record in DNS, which is part of the discussion of agent bootstrapping. If you’ve got a cluster with 3 servers, you don’t want all three updating Route53 (for example) when a server is added or removed; that should only happen from the leader. So the remaining issue is that I need a clean way to identify if a particular node is the leader, and ideally there would be a blocking query that would return the current set of servers.

A rejoin using old cached member information could fail due to a stale cache (Depending on how long the node has been down, the topology could have change)

With a definitive way to determine the servers for a datacenter (_consul._tcp.$DOMAIN SRV record), re-joining just becomes a performance optimization. I think the real solution is having a well-defined way to join a cluster on initial boot.

armon · 2014-05-04T21:37:39Z

@blalor This actually does not solve the chicken and egg issue. This solution assumes that the DNS servers exist at a well known address to begin with (otherwise the SRV record would fail with no DNS server). At some point, there must be a well known address, that may be Consul or it may be DNS.

My suggestion is to because this is unescapable, simply run Consul on the DNS server. You need a well known address for at least one of them, this way it is 2 birds with one stone. Doing this allows you do use DNS to join the cluster without making any changes to Consul.

If you have 3 well known DNS addresses, then the DNS lookup for "consul.service.consul" will work unless all the DNS nodes are down, which is unlikely. Hope that helps.

blalor · 2014-05-04T21:41:00Z

But I'm not running my own DNS server. I'm using Route53.

armon · 2014-05-04T21:45:07Z

@blalor Can you have a cron job on the consul servers that writes their IP to Route53 every few minutes? This way you can just join a well known address that is relatively up to date. It is only the initial join that is an issue, since moving forward we will be adding the re-join support.

blalor · 2014-05-04T22:01:34Z

If I just run a cron job on each server to ensure that its own IP is in the SRV record, I'll have to manually update the SRV record when servers are decommissioned. I'll also have to ensure that server-a doesn't overwrite the changes just made by server-b, since there's no atomic add/update operation in the Route53 API for a single record.

Consul already knows which servers are in the cluster, but If I write a script to use the output of consul members -role=consul -status=alive, I'll want to only do it on the leader so that I'm only updating the record once. The problem with this is that there's no way to query Consul to determine if a given node is the leader. I can use /v1/status/leader, but that returns IP:port; I'd have to either match that IP against all addresses bound to all interfaces on the host, or use the Consul config file to determine the IPs that the process is binding to.

armon · 2014-05-04T22:44:41Z

@blalor Why do you need the leader node specifically? You only need to do a join with any node, so it doesn't need to be the leader. It is totally fine if server-a overwrites the record of server-b, since a join to any of them will succeed.

Trying to ensure only a single write from the cluster leader only seems like an optimization that isn't necessary due to the nature of the gossip and the join. If you just have a cron on the servers, allowing an override, then even when you decommission, one of the remaining live nodes will update the record by overwriting it.

blalor · 2014-05-05T01:02:12Z

I understand it isn't strictly necessary, but it feels sloppy to make the same API call with the same data from 3-5 different hosts at once.

Having a list of all servers in the SRV record increases the chance of a successful join in the event of a network partition or temporary unavailability of one server.

dennybaa · 2014-06-18T15:00:35Z

Hi,

it seems like rejoin after leave doesn't work. Namely if I interrupt consul with INT it leaves cluster. Also it wipes out peers.json with null value. Even if -rejoin is used consul can't get back online since theres's no info about peers...
However docs say:
-rejoin When provided Consul will ignore a previous leave and attempt to rejoin the cluster when starting.

So it's supposed that consul should rejoin after leave or I misunderstand something?

armon · 2014-06-18T17:32:44Z

@dennybaa That is embarrassing! Fixed in a05e1ae.

dennybaa · 2014-06-18T19:33:59Z

Awesome. Cheers!

blalor · 2014-12-16T10:47:03Z

Wow, I was just thinking about this problem

Brian, could this same technique be used as part of the startup of an
individual consul server?

I am currently using Hashicorp's consul-join.conf upstart script found
here:
https://github.com/hashicorp/consul/blob/master/terraform/aws/scripts/upstart-join.conf

I was thinking to replace the hardcoded host IP to join a cluster with a
query for the SRV record. Do you see any immediate problems w/ this?

I am using ansible to start up the cluster. As part of the start up, I will
check for the existence of the SRV record. Should it not exist, i will
write the value of the first provisioned consul server.

On Sunday, May 4, 2014 11:31:39 PM UTC+2, Brian Lalor wrote:

[Trying to bring threads in two mediums together]

On May 4, 2014, at 4:57 PM, XavM <notifi...@github.com javascript:>
wrote:

Regarding the consul join command with a DNS host name, you have to know
which host is up, is running consul (and on which port) and is member in
the desired cluster, meaning you still have to maintain a hard-coded list
of hosts:ports to join on startup

This is it exactly. It’s a chicken/egg problem. I think a -srv-domain
option would be ideal. “-srv-domain” with no domain could default to the
host’s DNS domain name, or one could be provided. The actual record that
would be queried would be _consul._tcp.$DOMAIN. Then consul would just
join using all of the entries in that record. Consul itself can’t be
queried because it hasn’t joined the cluster, yet! I’ve got a work-around
for not having -srv-domain; I’m using Chef to run the following:
if ! curl -f -s 'http://localhost:8500/v1/status/leader' &> /dev/null 
; then
dig +search +noall +answer _consul._tcp SRV | awk '{print $NF}' |
xargs --no-run-if-empty consul join
fi

/v1/status/leader returns 500 if there is no leader (and therefore the
agent’s not part of the cluster). If that query fails, then dig is used to
search for the SRV record, and all entries in that record are passed to
“consul join"

That just leaves the problem of how to update that record in DNS, which is
part of the discussion of agent bootstrapping. If you’ve got a cluster
with 3 servers, you don’t want all three updating Route53 (for example)
when a server is added or removed; that should only happen from the leader.
So the remaining issue is that I need a clean way to identify if a
particular node is the leader, and ideally there would be a blocking query
that would return the current set of servers.

A rejoin using old cached member information could fail due to a stale
cache (Depending on how long the node has been down, the topology could
have change)

With a definitive way to determine the servers for a datacenter
(_consul._tcp.$DOMAIN SRV record), re-joining just becomes a performance
optimization. I think the real solution is having a well-defined way to
join a cluster on initial boot.

armon added the enhancement label May 2, 2014

blalor added a commit to blalor/docker-centos-repobuilder that referenced this issue May 3, 2014

Re-join cluster when starting

337e19a

Preserve the cluster server members on stop and use them to join on start. Work-around for hashicorp/consul#110

This was referenced May 4, 2014

Provide API endpoint to return own node config #120

Closed

Provide API endpoint to return node config for current leader #121

Closed

armon closed this as completed in de30905 May 21, 2014

armon added a commit that referenced this issue Jun 18, 2014

agent: Fixing missing copy of RejoinAfterLeave flag. #110

a05e1ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow re-join after leave #110

Allow re-join after leave #110

armon commented May 2, 2014

bscott commented May 2, 2014

XavM commented May 3, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

XavM commented May 4, 2014

armon commented May 4, 2014

XavM commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 5, 2014

dennybaa commented Jun 18, 2014

armon commented Jun 18, 2014

dennybaa commented Jun 18, 2014

blalor commented Dec 16, 2014

Allow re-join after leave #110

Allow re-join after leave #110

Comments

armon commented May 2, 2014

bscott commented May 2, 2014

XavM commented May 3, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

XavM commented May 4, 2014

armon commented May 4, 2014

XavM commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 4, 2014

armon commented May 4, 2014

blalor commented May 5, 2014

dennybaa commented Jun 18, 2014

armon commented Jun 18, 2014

dennybaa commented Jun 18, 2014

blalor commented Dec 16, 2014