Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

==> Starting Consul agent... ==> Error starting agent: agent: timeout starting DNS servers #3802

Closed
eddamalique01 opened this issue Jan 15, 2018 · 15 comments
Assignees
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/bug Feature does not function as expected
Milestone

Comments

@eddamalique01
Copy link

root@edwin-Vostro-3900:~# ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:54:54:02:8f
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

docker_gwbridge Link encap:Ethernet HWaddr 02:42:00:8e:48:66
inet addr:172.18.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:ff:fe8e:4866/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1758 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:323758 (323.7 KB)

enp2s0 Link encap:Ethernet HWaddr f4:8e:38:8a:67:cd
inet addr:172.16.15.167 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::ffa0:b32c:f0ea:ba7e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1053655 errors:0 dropped:0 overruns:0 frame:0
TX packets:332278 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:674991919 (674.9 MB) TX bytes:35364400 (35.3 MB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:1681392 errors:0 dropped:0 overruns:0 frame:0
TX packets:1681392 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:993924398 (993.9 MB) TX bytes:993924398 (993.9 MB)

vboxnet0 Link encap:Ethernet HWaddr 0a:00:27:00:00:00
inet addr:192.168.99.1 Bcast:192.168.99.255 Mask:255.255.255.0
inet6 addr: fe80::800:27ff:fe00:0/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:36317 (36.3 KB)

vethf6711eb Link encap:Ethernet HWaddr de:77:ec:6a:b9:6f
inet6 addr: fe80::dc77:ecff:fe6a:b96f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1889 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:338981 (338.9 KB)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@edwin-Vostro-3900:~# consul members
Node Address Status Type Build Protocol DC Segment
edwin-Vostro-3900 127.0.0.1:8301 alive server 1.0.2 2 dc1

root@edwin-Vostro-3900:# consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = b55059f
version = 1.0.2
consul:
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 127.0.0.1:8300
server = true
raft:
applied_index = 1166
commit_index = 1166
fsm_pending = 0
last_contact = 0
last_log_index = 1166
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:39f69dcb-ca84-2dc8-b5fb-0070005db287 Address:127.0.0.1:8300}]
latest_configuration_index = 1
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 4
goroutines = 70
max_procs = 4
os = linux
version = go1.9.2
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 1
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
root@edwin-Vostro-3900:
# consul agent -server -bootstrap-expect=1 -data-dir=/tmp/consul -node=server-1 -bind=127.0.1.1 -enable-script-checks=true -config-dir=/etc/consul.d
BootstrapExpect is set to 1; this is the same as Bootstrap mode.
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
==> Error starting agent: agent: timeout starting DNS servers
Kindly help me to solve this error.

you have a question, please direct it to the
consul mailing list if it hasn't been
addressed in either the FAQ or in one
of the Consul Guides.

When filing a bug, please include the following:

Description of the Issue (and unexpected/desired result)

Reproduction steps

consul version for both Client and Server

Client: [client version here]
Server: [server version here]

consul info for both Client and Server

Client:

[Client `consul info` here]

Server:

[Server `consul info` here]

Operating system and Environment details

Log Fragments or Link to gist

Include appropriate Client or Server log fragments. If the log is longer
than a few dozen lines, please include the URL to the
gist.

TIP: Use -log-level=TRACE on the client and server to capture the maximum log detail.

@slackpad
Copy link
Contributor

Hi @eddamalique01 there's an open loop 1 second timeout there that we could open up. Do you see this every time or is it sporadic?

@slackpad slackpad added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 25, 2018
@slackpad slackpad added this to the Next milestone Jan 25, 2018
@mikeumus
Copy link

mikeumus commented Feb 2, 2018

Make sure all the required ports are open too:

  • Server RPC (Default 8300)
  • Serf LAN (Default 8301)
  • Serf WAN (Default 8302)
  • HTTP API (Default 8500)
  • DNS Interface (Default 8600)

From Consul docs:
https://www.consul.io/docs/agent/options.html#ports-used

@balupton
Copy link

balupton commented Feb 2, 2018

Running into this on scaleway. The scaleway security group is configured correctly.

Seeing it every time.

[root@par1-cluster-origin-0 ~]# /usr/local/bin/consul agent -config-dir=/etc/systemd/system/consul.d
==> Starting Consul agent...
==> Error starting agent: agent: timeout starting DNS servers

[root@par1-cluster-origin-0 ~]# cat /etc/systemd/system/consul.d/consul.json 
{
	"bootstrap_expect": 0,
	"server": true,

	"ui": true,
	"data_dir": "/opt/consul/data",
	"bind_addr": "redcated",
	"client_addr": "redacted",
	"retry_join": [""]
}

[root@par1-cluster-origin-0 ~]# consul info
Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: dial tcp 127.0.0.1:8500: getsockopt: connection refused

bind_addr is the private ip, client_addr is the public ip, this setup of consul is meant to be the only server for now

@balupton
Copy link

balupton commented Feb 2, 2018

Ok, changing the configuration seems to have that error but introduced a new one

[root@par1-cluster-origin-0 ~]# cat /etc/systemd/system/consul.d/consul.json 
{
	"server": true,
	"ui": true,
	"data_dir": "/opt/consul/data"
}

[root@par1-cluster-origin-0 ~]# /usr/local/bin/consul agent -config-dir=/etc/systemd/system/consul.d
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.0.3'
           Node ID: 'redacted'
         Node name: 'par1-cluster-origin-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: redacted (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/02/02 05:43:11 [INFO] raft: Initial configuration (index=0): []
    2018/02/02 05:43:11 [INFO] raft: Node at 10.10.43.65:8300 [Follower] entering Follower state (Leader: "")
    2018/02/02 05:43:11 [INFO] serf: EventMemberJoin: par1-cluster-origin-0.dc1 redacted
    2018/02/02 05:43:11 [WARN] serf: Failed to re-join any previously known node
    2018/02/02 05:43:11 [INFO] serf: EventMemberJoin: par1-cluster-origin-0 redacted
    2018/02/02 05:43:11 [WARN] serf: Failed to re-join any previously known node
    2018/02/02 05:43:11 [INFO] consul: Adding LAN server par1-cluster-origin-0 (Addr: tcp/10.10.43.65:8300) (DC: dc1)
    2018/02/02 05:43:11 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2018/02/02 05:43:11 [INFO] consul: Handled member-join event for server "par1-cluster-origin-0.dc1" in area "wan"
    2018/02/02 05:43:11 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2018/02/02 05:43:11 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
    2018/02/02 05:43:11 [INFO] agent: started state syncer
    2018/02/02 05:43:18 [ERR] agent: failed to sync remote state: No cluster leader
    2018/02/02 05:43:20 [WARN] raft: no known peers, aborting election

@balupton
Copy link

balupton commented Feb 2, 2018

With this it works, but is still not accessible via the public ip

{
	"server": true,
	"bootstrap": true,	
	"client_addr": "0.0.0.0",
	"bind_addr": "0.0.0.0",
	"ui": true,
	"data_dir": "/opt/consul/data"
}

[root@par1-cluster-origin-0 ~]# /usr/local/bin/consul agent -config-dir=/etc/systemd/system/consul.d
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.0.3'
           Node ID: 'redacted'
         Node name: 'par1-cluster-origin-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.10.43.65 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/02/02 05:55:42 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID: redacted Address: redacted:8300}]
    2018/02/02 05:55:42 [INFO] raft: Node at redacted:8300 [Follower] entering Follower state (Leader: "")
    2018/02/02 05:55:42 [INFO] serf: EventMemberJoin: par1-cluster-origin-0.dc1 redacted
    2018/02/02 05:55:42 [WARN] serf: Failed to re-join any previously known node
    2018/02/02 05:55:42 [INFO] serf: EventMemberJoin: par1-cluster-origin-0 redacted
    2018/02/02 05:55:42 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/02/02 05:55:42 [INFO] consul: Handled member-join event for server "par1-cluster-origin-0.dc1" in area "wan"
    2018/02/02 05:55:42 [WARN] serf: Failed to re-join any previously known node
    2018/02/02 05:55:42 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/02/02 05:55:42 [INFO] consul: Adding LAN server par1-cluster-origin-0 (Addr: tcp/10.10.43.65:8300) (DC: dc1)
    2018/02/02 05:55:42 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/02/02 05:55:42 [INFO] agent: started state syncer
    2018/02/02 05:55:49 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2018/02/02 05:55:49 [INFO] raft: Node at 10.10.43.65:8300 [Candidate] entering Candidate state in term 2
    2018/02/02 05:55:49 [INFO] raft: Election won. Tally: 1
    2018/02/02 05:55:49 [INFO] raft: Node at redacted:8300 [Leader] entering Leader state
    2018/02/02 05:55:49 [INFO] consul: cluster leadership acquired
    2018/02/02 05:55:49 [INFO] consul: New leader elected: par1-cluster-origin-0
    2018/02/02 05:55:49 [INFO] consul: member 'par1-cluster-origin-0' joined, marking health alive
    2018/02/02 05:55:49 [INFO] agent: Synced node info

@balupton
Copy link

balupton commented Feb 2, 2018

Figured it out. CentOS runs a firewall:

https://www.digitalocean.com/community/tutorials/how-to-configure-the-linux-firewall-for-docker-swarm-on-centos-7

@slackpad
Copy link
Contributor

slackpad commented Feb 5, 2018

There are two things I think we can do to clean up the UX here:

  1. When agent.Start() fails, it logs the last error to the console, but doesn't print any of the logging that occurred before. It's likely that an error was generated here with more context, but it wasn't shown - https://github.com/hashicorp/consul/blob/v1.0.3/agent/agent.go#L387.

  2. We should figure out why there's an accept exception to the error message - https://github.com/hashicorp/consul/blob/v1.0.3/agent/agent.go#L386. Seems like we should drop that exception or at least document why it's there.

@slackpad slackpad added type/bug Feature does not function as expected theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner and removed waiting-reply Waiting on response from Original Poster or another individual in the thread labels Feb 5, 2018
@jaybe78
Copy link

jaybe78 commented May 9, 2019

Hi Guys, has anyone of you found a way to fix that issue ?
I'm getting the same issue with latest consul image.Failed to setup node ID: open /var/lib/consul/node-id: permission denied
I run consul on a kubernetes cluster (I've followed that tutorial https://github.com/Shinzu/kubernetes-consul-vault.)

Please help

@benzionyunger
Copy link

I have the same issue , also looking for help

@mkeeler
Copy link
Member

mkeeler commented Aug 15, 2019

@jaybe78 Your Agent is failing to start because it doesn't have permissions on /var/lib/consul/node-id. Presumably your data directory is /var/lib/consul and in that case the user you are running Consul as needs to have write permissions on that directory.

@benzionyunger Is yours failing with the DNS error or the permission denied error?

@benzionyunger
Copy link

@mkeeler It was failing with permission denied. But I did manage to resolve the issue, it was stemming from a volume mount to the host which did not have writable permissions as default.

@automaticgiant
Copy link

automaticgiant commented Sep 24, 2019

There are two things I think we can do to clean up the UX here:

  1. When agent.Start() fails, it logs the last error to the console, but doesn't print any of the logging that occurred before. It's likely that an error was generated here with more context, but it wasn't shown - https://github.com/hashicorp/consul/blob/v1.0.3/agent/agent.go#L387.

Check if it is not fixed in #4598. A v1.6.1 dev agent configured for but without privilege for port 53 exits as follows:

==> Error starting agent: 2 errors occurred:
        * listen udp 192.168.8.197:53: bind: permission denied
        * listen tcp 192.168.8.197:53: bind: permission denied


    2019/09/24 14:09:44 [INFO] agent: Exit code: 1
  1. We should figure out why there's an accept exception to the error message - https://github.com/hashicorp/consul/blob/v1.0.3/agent/agent.go#L386. Seems like we should drop that exception or at least document why it's there.

@slackpad, why is it there? Do you remember? Maybe better question for @magiconair.
It happened in 3e39f04
as part of #3037.

We can probably close this once the exception question is answered/documented.

@jsosulska jsosulska self-assigned this Aug 18, 2020
@dnephin
Copy link
Contributor

dnephin commented Aug 18, 2020

I think #8234 may improve the UX here (if there is still a problem in recent versions). Any failures to start the long running goroutines should cause the agent to exit with the relevant error.

@mikemorris
Copy link
Contributor

This sounds like it's mostly been resolved, but please reopen if this is still an issue on current versions of Consul.

@PharaohRiot
Copy link

I have a fix.

If you installed consul via choco then run as an administrator choco uninstall consul.

And after that put downloaded consul.exe file from their original website into C:\ProgramData

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests