Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't proper deregister and don't recover after consul service restart #3030

Closed
juise opened this issue May 10, 2017 · 2 comments
Closed

Don't proper deregister and don't recover after consul service restart #3030

juise opened this issue May 10, 2017 · 2 comments
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected

Comments

@juise
Copy link

juise commented May 10, 2017

consul version for both Client and Server

Client: Consul v0.8.1 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Server: Consul v0.8.1 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

consul info for both Client and Server

Client:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 2
	services = 2
build:
	prerelease =
	revision = 'e9ca44d
	version = 0.8.1
consul:
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 32
	goroutines = 42
	max_procs = 32
	os = linux
	version = go1.8.1
serf_lan:
	encrypted = true
	event_queue = 0
	event_time = 46
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 2875
	members = 136
	query_queue = 0
	query_time = 24

Server:

$ consul info
agent:
	check_monitors = 2
	check_ttls = 0
	checks = 15
	services = 27
build:
	prerelease =
	revision = 'e9ca44d
	version = 0.8.1
consul:
	bootstrap = true
	known_datacenters = 4
	leader = false
	leader_addr = 185.49.X.X:8300
	server = true
raft:
	applied_index = 34563295
	commit_index = 34563295
	fsm_pending = 0
	last_contact = 79.951787ms
	last_log_index = 34563295
	last_log_term = 3259
	last_snapshot_index = 34556134
	last_snapshot_term = 3259
	latest_configuration = [{Suffrage:Voter ID:185.49.X.X:8300 Address:185.49.X.X:8300} {Suffrage:Voter ID:185.49.X.X:8300 Address:185.49.X.X:8300} {Suffrage:Voter ID:185.49.X.X:8300 Address:185.49.X.X:8300} {Suffrage:Voter ID:185.49.X.X:8300 Address:185.49.X.X:8300} {Suffrage:Voter ID:185.49.X.X:8300 Address:185.49.X.X:8300}]
	latest_configuration_index = 32083055
	num_peers = 4
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 3259
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 740
	max_procs = 4
	os = linux
	version = go1.8.1
serf_lan:
	encrypted = true
	event_queue = 0
	event_time = 46
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 2875
	members = 136
	query_queue = 0
	query_time = 24
serf_wan:
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 27
	members = 8
	query_queue = 0
	query_time = 13

Operating system and Environment details

uname -a
Linux xxx.yyy 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the Issue (and unexpected/desired result)

  1. When disk space are full, consul client work incorrectly, doesn't full deregister service - show that service alive (green), but alive only serf (Serf Health Status) - https://gist.github.com/juise/f0d8e820f55d100e6952fd77a31e002d. The problem look like unexpected end of JSON input after server crash #1221.
  2. When disk space is cleared, consul still see that disk is full (see logs in my gist), after consul service reboot, consul doesn't start (see logs in my gist).

Reproduction steps

How to reproduce problem: Fill disck space.
How to fix problem: clear disk space, reboot consul, if doesn't start, remove *.tmp files.

Log Fragments or Link to gist

https://gist.github.com/juise/3e30b3a2429b2470b0353109c37de55c

@juise juise changed the title Don't proper deregister and don'r recover after consul service restart Don't proper deregister and don't recover after consul service restart May 10, 2017
@slackpad slackpad added the type/bug Feature does not function as expected label May 18, 2017
@slackpad
Copy link
Contributor

slackpad commented May 18, 2017

I'm thinking this is a duplicate of #2236 but we will double check.

@juise
Copy link
Author

juise commented May 18, 2017

@slackpad Please, note that this issue about abnormal behavior, when disk was full and service X crashed, without proper deregister, consul shows that service X is alive (green) but alive only serf.

@slackpad slackpad added the theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization label May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants