Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent can't connect to server when server listens on non-default port #764

Closed
am813nt opened this issue Mar 6, 2015 · 12 comments
Closed

Comments

@am813nt
Copy link

am813nt commented Mar 6, 2015

Consul 0.4.1.

In agent config:
"start_join": [ "10.8.7.6:9300" ]

In agent's log:
==> Reading remote state failed: read tcp 10.8.7.6:9300: connection reset by peer

In server's log:
[ERR] consul.rpc: unrecognized RPC byte: 9

@mthenw
Copy link

mthenw commented Mar 6, 2015

I have similar problem with connecting consul-template.
consul (v0.5.0) logs:
[ERR] consul.rpc: unrecognized RPC byte: 71

consul-template (v0.7.0) logs:
[ERR] (runner) watcher reported error: Get http://node1.node.dc1.consul.:8300/v1/health/service/app?passing=1&wait=60000ms: read tcp 172.17.0.6:8300: connection reset by peer

@armon
Copy link
Member

armon commented Mar 6, 2015

Make sure that start_join is talking to the Serf LAN port, not the RPC port. That is likely why you are having issues. Default port is 8301.

@am813nt
Copy link
Author

am813nt commented Mar 6, 2015

OK, thanks! I thought agent must connect to server' RPC port, 8300. Changing it to 8301 fixed the problem. I suppose that non-default ports configuration is worth to be mentioned in the documentation: firewalled/port forwarding deployments when "datacenter" is not physical but logical are common now.

@am813nt am813nt closed this as completed Mar 6, 2015
@arj22
Copy link
Contributor

arj22 commented Jun 15, 2017

@mthenw I get the same error as yours
consul.rpc: unrecognized RPC byte: 71
Were you able to resolve it?

@alkalinecoffee
Copy link

I was getting a ton of unrecognized RPC byte: 71 messages from Prometheus' consul service discovery. Disabling it stopped these messages for me. I'll have to go through my Prometheus config and see if there's anything I can change on my end, otherwise it should be brought up with the Prometheus folks.

@ygdkn
Copy link

ygdkn commented Aug 30, 2017

@alkalinecoffee Do you came up with any solution?

@justinzyw
Copy link

justinzyw commented Jul 9, 2018

got exactly the same problem..

  • prometheus: v2.3.1
  • consul: 1.2.0

I traced the log from both side:

Consul Log:


==> Found address '11.7.112.163' for interface 'eth0', setting bind option...,
==> Found address '11.7.112.163' for interface 'eth0', setting client option...,
bootstrap = true: do not enable unless necessary,
==> Starting Consul agent...,
==> Consul agent running!,
Version: 'v1.2.0',
Node ID: '54846c2b-d5b8-7406-c675-b20c22c0dada',
Node name: '6e37514f66a2',
Datacenter: 'dc1' (Segment: ''),
Server: true (Bootstrap: true),
Client Addr: [11.7.112.163] (HTTP: 8500, HTTPS: -1, DNS: 8600),
Cluster Addr: 11.7.112.163 (LAN: 8301, WAN: 8302),
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false,
,
==> Log data will now stream in as it occurs:,
,
2018/07/09 10:47:04 [DEBUG] agent: Using random ID "54846c2b-d5b8-7406-c675-b20c22c0dada" as node ID,
2018/07/09 10:47:04 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:54846c2b-d5b8-7406-c675-b20c22c0dada Address:11.7.112.163:8300}],
2018/07/09 10:47:04 [INFO] raft: Node at 11.7.112.163:8300 [Follower] entering Follower state (Leader: ""),
2018/07/09 10:47:04 [WARN] memberlist: Binding to public address without encryption!,
2018/07/09 10:47:04 [INFO] serf: EventMemberJoin: 6e37514f66a2.dc1 11.7.112.163,
2018/07/09 10:47:04 [WARN] memberlist: Binding to public address without encryption!,
2018/07/09 10:47:04 [INFO] serf: EventMemberJoin: 6e37514f66a2 11.7.112.163,
2018/07/09 10:47:04 [INFO] consul: Adding LAN server 6e37514f66a2 (Addr: tcp/11.7.112.163:8300) (DC: dc1),
2018/07/09 10:47:04 [INFO] consul: Handled member-join event for server "6e37514f66a2.dc1" in area "wan",
2018/07/09 10:47:04 [INFO] agent: Started DNS server 11.7.112.163:8600 (udp),
2018/07/09 10:47:04 [DEBUG] agent/proxy: managed Connect proxy manager started,
2018/07/09 10:47:04 [INFO] agent: Started DNS server 11.7.112.163:8600 (tcp),
2018/07/09 10:47:04 [INFO] agent: Started HTTP server on 11.7.112.163:8500 (tcp),
2018/07/09 10:47:04 [INFO] agent: started state syncer,
2018/07/09 10:47:10 [WARN] raft: Heartbeat timeout from "" reached, starting election,
2018/07/09 10:47:10 [INFO] raft: Node at 11.7.112.163:8300 [Candidate] entering Candidate state in term 2,
2018/07/09 10:47:10 [DEBUG] raft: Votes needed: 1,
2018/07/09 10:47:10 [DEBUG] raft: Vote granted from 54846c2b-d5b8-7406-c675-b20c22c0dada in term 2. Tally: 1,
2018/07/09 10:47:10 [INFO] raft: Election won. Tally: 1,
2018/07/09 10:47:10 [INFO] raft: Node at 11.7.112.163:8300 [Leader] entering Leader state,
2018/07/09 10:47:10 [INFO] consul: cluster leadership acquired,
2018/07/09 10:47:10 [INFO] consul: New leader elected: 6e37514f66a2,
2018/07/09 10:47:11 [DEBUG] consul: Skipping self join check for "6e37514f66a2" since the cluster is too small,
2018/07/09 10:47:11 [INFO] consul: member '6e37514f66a2' joined, marking health alive,
2018/07/09 10:47:11 [INFO] agent: Synced node info,
2018/07/09 10:47:11 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically,
2018/07/09 10:47:11 [DEBUG] agent: Node info in sync,
2018/07/09 10:47:11 [DEBUG] agent: Node info in sync,
2018/07/09 10:47:45 [DEBUG] http: Request GET /v1/catalog/service/consul?index=3765&stale=&wait=30000ms (30.454325605s) from=11.7.111.205:40110,
2018/07/09 10:47:51 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:57854,
2018/07/09 10:48:02 [DEBUG] http: Request GET /v1/catalog/services?index=3765&stale=&wait=30000ms (31.469989204s) from=11.7.111.205:40198,
2018/07/09 10:48:06 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:57940,
2018/07/09 10:48:10 [DEBUG] consul: Skipping self join check for "6e37514f66a2" since the cluster is too small,
2018/07/09 10:48:17 [DEBUG] http: Request GET /v1/catalog/service/consul?index=5&stale=&wait=30000ms (31.500462633s) from=11.7.111.205:40110,
2018/07/09 10:48:21 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58050,
2018/07/09 10:48:33 [DEBUG] http: Request GET /v1/catalog/services?index=5&stale=&wait=30000ms (30.729536069s) from=11.7.111.205:40198,
2018/07/09 10:48:35 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically,
2018/07/09 10:48:35 [DEBUG] agent: Node info in sync,
2018/07/09 10:48:36 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58110,
2018/07/09 10:48:47 [DEBUG] http: Request GET /v1/catalog/service/consul?index=5&stale=&wait=30000ms (30.138009057s) from=11.7.111.205:40110,
2018/07/09 10:48:51 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58164,
2018/07/09 10:49:04 [DEBUG] http: Request GET /v1/catalog/services?index=5&stale=&wait=30000ms (31.027457147s) from=11.7.111.205:40198,
2018/07/09 10:49:04 [DEBUG] manager: Rebalanced 1 servers, next active server is 6e37514f66a2.dc1 (Addr: tcp/11.7.112.163:8300) (DC: dc1),
2018/07/09 10:49:06 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58262,
2018/07/09 10:49:11 [DEBUG] consul: Skipping self join check for "6e37514f66a2" since the cluster is too small,
2018/07/09 10:49:18 [DEBUG] http: Request GET /v1/catalog/service/consul?index=5&stale=&wait=30000ms (31.294581641s) from=11.7.111.205:40110,
2018/07/09 10:49:21 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58376,
2018/07/09 10:49:35 [DEBUG] http: Request GET /v1/catalog/services?index=5&stale=&wait=30000ms (31.455037285s) from=11.7.111.205:40198,
2018/07/09 10:49:36 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58426,
2018/07/09 10:49:48 [DEBUG] http: Request GET /v1/catalog/service/consul?index=5&stale=&wait=30000ms (30.389081016s) from=11.7.111.205:40110,
2018/07/09 10:49:51 [ERR] consul.rpc: unrecognized RPC byte: 71 from=11.7.111.205:58496,


Prometheus Log:


level=info ts=2018-07-09T10:02:31.045734474Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.1, branch=HEAD, revision=188ca45bd85ce843071e768d855722a9d9dabe03)",
level=info ts=2018-07-09T10:02:31.04584507Z caller=main.go:223 build_context="(go=go1.10.3, user=root@82ef94f1b8f7, date=20180619-15:56:22)",
level=info ts=2018-07-09T10:02:31.045891378Z caller=main.go:224 host_details="(Linux 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 a14d7c2eecda (none))",
level=info ts=2018-07-09T10:02:31.045932406Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)",
level=info ts=2018-07-09T10:02:31.052226782Z caller=main.go:514 msg="Starting TSDB ...",
level=info ts=2018-07-09T10:02:31.052515682Z caller=web.go:415 component=web msg="Start listening for connections" address=0.0.0.0:9090,
level=info ts=2018-07-09T10:02:31.089566445Z caller=main.go:524 msg="TSDB started",
level=info ts=2018-07-09T10:02:31.089779544Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml,
level=info ts=2018-07-09T10:02:31.139157273Z caller=main.go:500 msg="Server is ready to receive web requests.",
level=debug ts=2018-07-09T10:02:31.161557207Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:02:31.164360145Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:02:45.390380622Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:48778->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:03:00.390315273Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:48856->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:03:01.161828035Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:03:01.164571297Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:03:15.390044364Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:48908->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:03:30.390162558Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:48952->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:03:32.289811311Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:03:32.394923306Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:03:45.390613373Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49056->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:04:00.390334137Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49162->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:04:03.523601804Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:04:03.633008058Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:04:15.390045306Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49208->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:04:30.399936542Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49254->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:04:33.813517331Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:04:35.42521125Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:04:45.390152854Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49330->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:05:00.390222369Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49436->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:05:04.749118169Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:05:06.455311158Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:05:15.39004853Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49514->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:05:30.390099161Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49562->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:05:36.557193841Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:05:37.043140637Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:05:45.398137003Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49650->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:06:00.391082054Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49754->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:06:07.518434141Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:06:08.371608234Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:06:15.389947695Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49820->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:06:30.390053842Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49866->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:06:38.505216492Z caller=consul.go:439 component="discovery manager scrape" discovery=consul msg="Watching service" service=consul tag=,
level=debug ts=2018-07-09T10:06:38.727409244Z caller=consul.go:334 component="discovery manager scrape" discovery=consul msg="Watching services" tag=,
level=debug ts=2018-07-09T10:06:45.390280325Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:49914->11.7.110.29:8300: read: connection reset by peer",
level=debug ts=2018-07-09T10:07:00.390659498Z caller=scrape.go:703 component="scrape manager" scrape_pool=node-exporter target=http://11.7.110.29:8300/metrics msg="Scrape failed" err="Get http://11.7.110.29:8300/metrics: read tcp 11.7.111.205:50002->11.7.110.29:8300: read: connection reset by peer",


@Jeskz0rd
Copy link

I am facing the same issue, apparently, the "SERVICE_IGNORE" is not working properly.
I also could not fix that :/

@CoderFei
Copy link

CoderFei commented Aug 17, 2019

@justinzyw
I got the same problem with you.
My case: prometheus scrape targets according to consul services once every minute

prometheus.yml: conul_sd_configs section

    consul_sd_configs:
    - {datacenter: dc1, server: '127.0.0.1:8500'}

Consul logs

Aug 17 14:30:37 xxx consul: 2019/08/17 14:30:37 [ERR] consul.rpc: unrecognized RPC byte: 71 from=127.0.0.1:48342
Aug 17 14:31:37 xxx consul: 2019/08/17 14:31:37 [ERR] consul.rpc: unrecognized RPC byte: 71 from=127.0.0.1:48348

After some research, I resolved this issue.
Here is the root cause

1. Check consul services by login WEB console of consul (8500).  Services tab including a special service named consul whose address is http://127.0.0.1:8300.

2. Check prometheus targets by login WEB console of prometheus(9090). We can find that prometheus register consul service(http://127.0.0.1:8300/metrics) as a target and the status of this target is always down. Actually, consul service(http://127.0.0.1:8300/metrics) is not a valid prometheus exporter.

Here is my Solution

# Drop the consul target by modify the configuration of prometheus
vi prometheus.yml
... ...
    consul_sd_configs:
    - {datacenter: dc1, server: '127.0.0.1:8500'}
    relabel_configs:
    - source_labels: [__meta_consul_service]
      # drop consul self service
      regex:         '^consul$'
      action: drop
... ...
The target 'http://127.0.0.1:8300/metrics' will disappear from prometheus' targets list after restart prometheus service

@mikador
Copy link

mikador commented Jan 28, 2020

That solution worked for me. Logs are clean now. Thx!

@bvikhe
Copy link

bvikhe commented Sep 5, 2023

Make sure that start_join is talking to the Serf LAN port, not the RPC port. That is likely why you are having issues. Default port is 8301.

Is there a way to specify server RPC address in consul agent config.

In my case we are using consul-k8s and don't want to expose hostport.

consul agent trying to connect to pods IP:8300 which is not possible. so wanted to use custom :.

@wildme
Copy link

wildme commented Jun 26, 2024

In order to fix [ERROR] agent.server.rpc: unrecognized RPC byte: byte=71 I had to explicitly define services to scrape in the Prometheus config file:

- job_name: "consul"
    consul_sd_configs:
      - server: '<HOST>:8500'
        services: ["node_exporter"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests