-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Configuration reload does not discard Check's statuses for services #7345
[BUGFIX] Configuration reload does not discard Check's statuses for services #7345
Conversation
…ervices This fixes issue hashicorp#7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch hashicorp#6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1
LGTM |
Hello @i0rek , Do you think you could have a look ? The patch is not that complex, would fix #7318 (thus a regression fix) and is really impacting large clusters (many notifiations when a reload is done on services having many instances and causing lots of instability on the service... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks for your work!
@@ -478,7 +478,7 @@ func (a *Agent) Start() error { | |||
a.serviceManager.Start() | |||
|
|||
// Load checks/services/metadata. | |||
if err := a.loadServices(c); err != nil { | |||
if err := a.loadServices(c, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we don't need to pass in a snapshot is that the agent just started and it doesn't have any information in memory about anything pretty much.
…7345) This fixes issue #7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1
@pierresouchay Quick thank you for your swift follow up! :) |
This fixes issue #7318
Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.
What happened when a reload was:
In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.
This PR introduces the snap parameter, so step 3 can use information from step 1