Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "No path to datacenter" to a request to a valid, known datacenter #1471

Closed
wwalker opened this issue Dec 7, 2015 · 13 comments
Closed

Comments

@wwalker
Copy link

wwalker commented Dec 7, 2015

Making cross datacenter catalog lookups is occassionally failing. I've looked int the server logs of the stage datacenter's servers and there are no interesting entries around the times that these failures occur.

/var/log/consul.1: 2015/11/23 10:56:28 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/11/26 19:00:20 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/01 13:57:19 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/01 13:58:58 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/02 12:55:26 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/02 17:51:45 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/03 00:24:10 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/03 00:28:10 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/03 01:15:35 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul.1: 2015/12/03 03:48:32 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul: 2015/12/07 02:07:49 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul: 2015/12/07 02:22:52 [ERR] http: Request /v1/catalog/service/marathon?dc=stage, error: rpc error: No path to datacenter
/var/log/consul: 2015/12/07 09:25:09 [ERR] http: Request /v1/kv/?dc=stage&recurse, error: rpc error: No path to datacenter

@slackpad
Copy link
Contributor

slackpad commented Jan 9, 2016

Hi @wwalker do you have the full set of servers joined up if you do consul members -wan? This looks like all of your servers are getting pulled out of the Serf members list, which is very odd (unless you are having intermittent connectivity issues).

@slackpad
Copy link
Contributor

Closing this one assuming that you figured it out. Please re-open if that's not the case.

@michaelgaida
Copy link

michaelgaida commented Oct 6, 2017

I am facing the same problem.
It works for a while and then I get:

http://hm-consul-9004:8500/v1/catalog/services?dc=as                                                                                    Fri Oct  6 13:50:39 2017

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 100    21  100    21    0     0   4402      0 --:--:-- --:--:-- --:--:--  5250
No path to datacenter

After a while it keeps on working again without any change.

@michaelgaida
Copy link

Oct  6 14:37:44 hm-consul-9004 consul: 2017/10/06 14:37:44 [WARN] consul.rpc: RPC request for DC "as", no path found
Oct  6 14:37:44 hm-consul-9004 consul: 2017/10/06 14:37:44 [ERR] http: Request GET /v1/coordinate/nodes?dc=as&token=<hidden>, error: No path to datacenter from=10.120.9.128:64094
Oct  6 14:37:44 hm-consul-9004 consul: 2017/10/06 14:37:44 [DEBUG] http: Request GET /v1/coordinate/nodes?dc=as&token=<hidden> (52.759µs) from=102.130.10.128:64094
Oct  6 14:37:44 hm-consul-9004 consul[10024]: http: Request GET /v1/internal/ui/nodes?dc=as&token=<hidden>, error: No path to datacenter from=102.130.10:64092
Oct  6 14:37:44 hm-consul-9004 consul[10024]: http: Request GET /v1/internal/ui/nodes?dc=as&token=<hidden> (373.646µs) from=10.120.9.128:64092
Oct  6 14:37:44 hm-consul-9004 consul[10024]: consul.rpc: RPC request for DC "as", no path found
Oct  6 14:37:44 hm-consul-9004 consul[10024]: http: Request GET /v1/coordinate/nodes?dc=as&token=<hidden>, error: No path to datacenter from=102.130.10:64094
Oct  6 14:37:44 hm-consul-9004 consul[10024]: http: Request GET /v1/coordinate/nodes?dc=as&token=<hidden> (52.759µs) from=102.130.10:64094

@michaelgaida
Copy link

michaelgaida commented Oct 6, 2017

consul members -wan
Node                                   Address              Status  Type    Build  Protocol  DC
as-consul-8001.as                      [IP]:8302   alive   server  0.7.5  2         as
as-consul-8002.as                      [IP]:8302   alive   server  0.7.5  2         as
as-consul-8003.as                      [IP]:8302   alive   server  0.7.5  2         as
hm-consul-9004.hm                   [IP]:8302   alive   server  0.9.2  2         hm
hm-consul-9005.hm                   [IP]:8302   alive   server  0.9.2  2         hm
hm-consul-9006.hm                   [IP]:8302   alive   server  0.9.2  2         hm

@michaelgaida
Copy link

Can this be related to the different version?

@michaelgaida
Copy link

rpc error: failed to get conn: rpc error: lead thread didn't get connection
No path to datacenter

@michaelgaida
Copy link

@slackpad, could you please reopen the ticket?

@slackpad
Copy link
Contributor

Hi @michaelgaida those versions should be able to talk to each other, are there any RPC-related errors in the server logs or the agent where you are making requests from?

@slackpad slackpad reopened this Oct 18, 2017
@slackpad
Copy link
Contributor

slackpad commented Jan 5, 2018

Closing since we never heard back.

@slackpad slackpad closed this as completed Jan 5, 2018
@lucaswxp
Copy link

I have a similar problem.

> consul members -wan:
Node                       Address             Status  Type    Build  Protocol  DC           Segment
10.151.36.231.ibm-default  xxx.57.186.38:8302  alive   server  1.2.3  2         ibm-default  <all>
172.31.10.146.aws-default  xxx.228.15.47:8302   alive   server  1.2.3  2         aws-default  <all>

If I try to query

> curl "http://localhost:8500/v1/catalog/services?dc=ibm-default"
No path to datacenter

The consul logs:

    2018/09/28 18:38:58 [WARN] consul.rpc: RPC request for DC "ibm-default", no path found
    2018/09/28 18:38:58 [ERR] http: Request GET /v1/catalog/nodes?dc=ibm-default&stale=&wait=60000ms, error: No path to datacenter from=127.0.0.1:42410
`'`

Any ideas?

@lucaswxp
Copy link

No ideas how to debug this further guys?

@mozai
Copy link

mozai commented Nov 4, 2019

I've got many servers in a serf WAN mesh, each with their own serf LAN mesh of agents. Queries over the WAN happen often, a dozen requests every minute (as per Prometheus "consul_sd_config", but also a few cronjobs). For one serf WAN member I keep getting 500 (rpc error making call: No path to datacenter). It's not constant, but does happen for 1-2% of all requests, which can cause a lot of noise in error logs.

Stepping over to the consul server that is supposed to receive these requests, this is what I see in the logs around the same time the WAN requests fail:

2019/11/04 13:39:29 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:29 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.202.xxx.xxx:57313
2019/11/04 13:39:30 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:30 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.160.xxx.xxx:42487
2019/11/04 13:39:30 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:30 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.147.xxx.xxx:43531
2019/11/04 13:39:31 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:31 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.201.xxx.xxx:35739
2019/11/04 13:39:32 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:32 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.108.xxx.xxx:44261
2019/11/04 13:39:34 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:34 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:35 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:35 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.108.xxx.xxx:41499
2019/11/04 13:39:39 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:39 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.145.xxx.xxx:49319
2019/11/04 13:39:39 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:39 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.144.xxx.xxx:43379
2019/11/04 13:39:41 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:41 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.201.xxx.xxx:54069
2019/11/04 13:39:42 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:42 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.160.xxx.xxx:11407
2019/11/04 13:39:45 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:45 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.145.xxx.xxx:22777
2019/11/04 13:39:47 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:47 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.170.xxx.xxx:37169
2019/11/04 13:39:48 [WARN] yamux: failed to send ping reply: session shutdown 
2019/11/04 13:39:48 [ERR] yamux: keepalive failed: i/o deadline reached 
2019/11/04 13:39:48 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.145.xxx.xxx:43391
2019/11/04 13:39:50 [ERR] yamux: keepalive failed: i/o deadline reached 

Looks like the serf WAN sessions (tcp?) are getting dropped. I'm using the same config on other machines (enforced by puppet config manager) so I'm positive it's not a typo. It could be a flimsy net connection between this server and the other serf WAN members, so I'm not opening a new ticket, just leaving this here for the next person to find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants