Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul tip cannot startup if start_join nodes don't resolve in DNS (regression) #2140

Closed
rboyer opened this issue Jun 22, 2016 · 3 comments
Closed

Comments

@rboyer
Copy link
Member

rboyer commented Jun 22, 2016

consul version for both Client and Server

Custom vanilla build from development tip 09cfda47ed103910a8e1af76fa378a7e6acd5310

consul info for both Client and Server

N/A crashes on startup.

Operating system and Environment details

Description of the Issue (and unexpected/desired result)

On consul-0.6.4 I can bring up a 3 node cluster using start_join with hostnames in the array, even if those hostnames don't resolve yet in DNS (a side-effect of docker-compose launch ordering).

On consul-tip (09cfda4) the same configuration will crash on startup with logs indicating that it's failed some preflight dns resolution, rather than being lazy about it and trying multiple times.

I've included the first few seconds of logs below for both situations.

Reproduction steps

Log Fragments or Link to gist

### consul 0.6.4 (GOOD) ###

inf1_1        | ==> WARNING: Expect Mode enabled, expecting 3 servers
inf1_1        | ==> Starting Consul agent...
inf1_1        | ==> Starting Consul agent RPC...
inf1_1        | ==> Joining cluster...
inf1_1        |     Join completed. Synced with 1 initial agents
inf1_1        | ==> Consul agent running!
inf1_1        |          Node name: 'inf1'
inf1_1        |         Datacenter: 'inf'
inf1_1        |             Server: true (bootstrap: false)
inf1_1        |        Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
inf1_1        |       Cluster Addr: 10.10.2.11 (LAN: 8301, WAN: 8302)
inf1_1        |     Gossip encrypt: true, RPC-TLS: false, TLS-Incoming: false
inf1_1        |              Atlas: <disabled>
inf1_1        | 
inf1_1        | ==> Log data will now stream in as it occurs:
inf1_1        | 
inf1_1        |     2016/06/22 13:13:23 [INFO] raft: Node at 10.10.2.11:8300 [Follower] entering Follower state
inf1_1        |     2016/06/22 13:13:23 [INFO] serf: EventMemberJoin: inf1 10.10.2.11
inf1_1        |     2016/06/22 13:13:23 [INFO] serf: EventMemberJoin: inf1.inf 10.10.2.11
inf1_1        |     2016/06/22 13:13:23 [INFO] consul: adding LAN server inf1 (Addr: 10.10.2.11:8300) (DC: inf)
### consul 09cfda47ed103910a8e1af76fa378a7e6acd5310 (BAD) ###

inf1_1        | ==> WARNING: Expect Mode enabled, expecting 3 servers
inf1_1        | ==> Starting Consul agent...
inf1_1        | ==> Starting Consul agent RPC...
inf1_1        | ==> Joining cluster...
inf1_1        | ==> 3 error(s) occurred:
inf1_1        | 
inf1_1        | * Failed to join 10.10.2.11: EOF
inf1_1        | * Failed to resolve inf2: lookup inf2 on 127.0.0.11:53: no such host
inf1_1        | * Failed to resolve inf3: lookup inf3 on 127.0.0.11:53: no such host
### config used

{
    "bootstrap_expect":3,
    "disable_remote_exec":true,
    "advertise_addr":"10.10.2.11",
    "bind_addr":"10.10.2.11",
    "client_addr":"0.0.0.0",
    "datacenter":"inf",
    "data_dir":"/consul/data",
    "log_level":"TRACE",
    "node_name":"inf1",
    "rejoin_after_leave":true,
    "skip_leave_on_interrupt":true,
    "server":true,
    "ui":true,
    "encrypt":"SECRET_KEY",
    "start_join":[ "inf1","inf2","inf3" ]
}
@rboyer
Copy link
Member Author

rboyer commented Jun 22, 2016

If I switch start_join to be a list of IP addresses, it hangs and then dies anyway:

inf1_1        | ==> WARNING: Expect Mode enabled, expecting 3 servers
inf1_1        | ==> Starting Consul agent...
inf1_1        | ==> Starting Consul agent RPC...
inf1_1        | ==> Joining cluster...
inf1_1        | ==> 3 error(s) occurred:
inf1_1        | 
inf1_1        | * Failed to join 10.10.2.11: EOF
inf1_1        | * Failed to join 10.10.2.12: dial tcp 10.10.2.12:8301: getsockopt: no route to host
inf1_1        | * Failed to join 10.10.2.13: dial tcp 10.10.2.13:8301: getsockopt: no route to host

This is also bad, because it assumes the servers are all available and up when starting.

@rboyer
Copy link
Member Author

rboyer commented Jun 22, 2016

After chasing this a bit further, the branch wasn't entirely vanilla (with some local mods to memberlist that were bad). This is likely not a real issue.

@rboyer rboyer closed this as completed Jun 22, 2016
@slackpad
Copy link
Contributor

Appreciate the update - please let me know if you see anything strange as the pre-flight DNS thing is a new feature in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants