Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul 0.8.x - logging and yamux keepalive errors #3040

Closed
mnuic opened this issue May 13, 2017 · 11 comments · Fixed by #4193
Closed

Consul 0.8.x - logging and yamux keepalive errors #3040

mnuic opened this issue May 13, 2017 · 11 comments · Fixed by #4193
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature
Milestone

Comments

@mnuic
Copy link

mnuic commented May 13, 2017

consul version for both Client and Server

Client: Consul v0.8.3
Server: Consul v0.8.3
Tried with all v0.8.x version it is the same behavior.

consul info for both Client and Server

Client:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 10
build:
	prerelease =
	revision = ea2a82b
	version = 0.8.3
consul:
	bootstrap = true
	known_datacenters = 9
	leader = true
	leader_addr = SERVER_IP:8300
	server = true
raft:
	applied_index = 970663
	commit_index = 970663
	fsm_pending = 0
	last_contact = 0
	last_log_index = 970663
	last_log_term = 8
	last_snapshot_index = 967154
	last_snapshot_term = 7
	latest_configuration = [{Suffrage:Voter ID:SERVER_IP:8300 Address:SERVER_IP:8300}]
	latest_configuration_index = 1
	num_peers = 0
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 8
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 101
	max_procs = 8
	os = linux
	version = go1.8.1
serf_lan:
	encrypted = false
	event_queue = 1
	event_time = 8
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 687
	members = 15
	query_queue = 0
	query_time = 1

Server:

Same as client

Operating system and Environment details

Ubuntu 16.04.02

Description of the Issue (and unexpected/desired result)

Lot of log lines show this, it shoud be fixed:
[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.

yamux keepalive shows ERROR, with client and server on the same subnet, maybe timeout shoud be increased:
[ERR] yamux: keepalive failed: session shutdown

@slackpad
Copy link
Contributor

Hi @mnuic

[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.

That's not a built-in registration that Consul adds - it looks like something in your cluster is configured with a space in front of the name.

[ERR] yamux: keepalive failed: session shutdown

The timeout on that one is 30 seconds, which is pretty long. This often is the result of firewalls that track connections and close them when they are quiet, or other network connectivity issues.

Hope that helps!

@mnuic
Copy link
Author

mnuic commented May 25, 2017

Hi @slackpad

You were right, there was a space in service definition in one of my clusters.

Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.

@slackpad
Copy link
Contributor

Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.

That message could also come from a connection that failed from one of the Consul clients. I think if an agent died or dropped off the network you might also see that. Do you have agents coming and going?

@mnuic
Copy link
Author

mnuic commented Jun 1, 2017

No, nothing in log for yesterday or for the last week. Found one connection drop for one agent in different dc and that's it. Is it possible to lower the log level for this? Because it is not suggesting that there are any real problems that I see for now.

@slackpad
Copy link
Contributor

slackpad commented Jun 2, 2017

Yeah we get a lot of people concerned with these and they can occur for a number of reasons that aren't really important. I'll open this to track tweaking the log level.

@slackpad slackpad reopened this Jun 2, 2017
@slackpad slackpad added type/enhancement Proposed improvement or new feature theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner labels Jun 2, 2017
@roman-vynar
Copy link

roman-vynar commented Jun 9, 2017

We are getting a lot of excessive logging too.

For example:

Today 10:25:06 PM <redacted> consul [err] ==> Newer Consul version available: 0.8.4 (currently running: 0.8.3)

This shouldn't be err I think.

Randomly lots of this:

 [ERR] yamux: keepalive failed: i/o deadline reached
WARN     2017/06/08 16:24:40 [WARN] memberlist: Refuting a suspect message (from: node3)
 ERR     2017/06/08 16:26:00 [ERR] memberlist: Failed fallback ping: write tcp 10.0.4.102:40256->10.0.4.123:8301: i/o timeout

It was not the case with Consul 0.7.4.

@mnuic
Copy link
Author

mnuic commented Aug 21, 2017

Just installed consul version 0.9.2 and yamux messages still showing (randomly with no obvious reason on multiple server/clients):

x.y.z.q     2017/08/21 13:06:19 [ERR] yamux: keepalive failed: session shutdown

@slackpad could You please lower the log level for the next release for this log message.

@kerneljack
Copy link

@slackpad Same here, lots of these messages across 3 Consul servers, could you please add more logging info so it's at least more obvious what it's actually trying to do so we can debug properly instead of having to dig through tcpdumps? Or please change the logging for it so it doesn't show up in the logs, thanks.

    2017/10/12 15:20:35 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 16:16:02 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 17:17:42 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:12 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:40 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:37:50 [ERR] yamux: keepalive failed: session shutdown

@slackpad slackpad added this to the Unplanned milestone Oct 17, 2017
@mnuic
Copy link
Author

mnuic commented Apr 16, 2018

Is there any chance to see this resolved in the next release? We have a lot consul nodes and our logs are still showing this kind of yamux messages with no obvious reason. Or could you just lower the log-level to info maybe?

@mnuic
Copy link
Author

mnuic commented May 16, 2018

Still happens on Consul 1.1.0:

2018/05/16 09:42:50 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:45:56 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:54:33 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:04:09 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:17:34 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:22:42 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:35:52 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:40:52 [ERR] yamux: keepalive failed: session shutdown

We have a lot of nodes. Can you change log level to warning or debug?

@buckeyze
Copy link

buckeyze commented Jun 4, 2018

We are getting these too on version 1.0.7+ent:

2018/06/03 13:24:45 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:06:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:20:46 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:32:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:55:41 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 15:01:01 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:03:18 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:13:08 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 17:48:55 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 22:15:15 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 23:37:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 00:18:51 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:41:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:44:54 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 03:46:37 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 04:02:29 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:33:16 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:49:39 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:31:31 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:59:07 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 08:25:20 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 09:00:30 [ERR] yamux: keepalive failed: session shutdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants