-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul 1.5.3 changes check status behavior when doing a consul reload? #7318
Comments
consul reload
?
consul reload
?
@pierresouchay, I'm currently testing in our development landscape, not sure how I can easily translate that to a simple test case, but I'll try :) |
@lvets any news ? |
@pierresouchay See https://github.com/lvets/legendary-octo-potato for a quick and dirty test scenario with Fabio, Consul servers and a bunch of Consul agents. Took a bit of time to translate our production infrastructure into docker-compose. |
Yes, I confirm the behavior from 1.5.2 to 1.5.3+ (still present in 1.7.1). Each time a reload is performed, the state becomes critical and the Output becomes empty. |
…ervices This fixes issue hashicorp#7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch hashicorp#6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1
@pierresouchay Thank you for your help with this! Do you have an idea when the fix might make it in a release? |
…7345) This fixes issue #7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1
…7345) This fixes issue #7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1
This ensures no regression about hashicorp#7318 And ensure that hashicorp#7446 cannot happen anymore
Overview of the Issue
Between Consul 1.5.2 and Consul 1.5.3, the default behavior of node checks when doing
consul reload
changed.With Consul 1.5.2, checks for a specific node had "passing" as status and stayed this way when doing a
consul reload
With Consul 1.5.3 and after, "passing" checks go to "critical" when doing a
consul reload
and when the checks pass, they go to "passing".I would've expected that the status of checks don't change when doing a
consul reload
.Additionally, because we're using Fabio, this also means that Fabio temporarily removes routes based on these checks when doing
consul reload
effectively causing an outage.Reproduction Steps
Steps to reproduce this issue, eg:
Use Consul 1.5.2
Run
consul reload
Check Fabio logs and/or
curl -s localhost:8500/v1/health/node/node
Check status doesn't change
Use Consul 1.5.3
Run
consul reload
Check Fabio logs and/or
curl -s localhost:8500/v1/health/node/node
Fabio routes are removed and checks status changes to "ciritical" until the checks run again.
Consul info for both Client and Server
Consul server: 1.6.0
Consul agent: 1.5.2 and 1.5.3.
Operating system and Environment details
OS: SLES 12 and Amazon Linux 2.
Log Fragments
I'm not a 100% sure how to include the logs, the Consul logs are the same between versions and in Fabio I can see routes being removed and added again for Consul 1.5.3, but nothing for 1.5.2 (i.e. the routes stay).
The text was updated successfully, but these errors were encountered: