[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

pierresouchay · 2020-02-25T15:10:56Z

This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

save all healthcheck statuses
cleanup everything
add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1

…ervices This fixes issue hashicorp#7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch hashicorp#6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1

pierresouchay · 2020-02-28T14:33:38Z

@rboyer This patch is a bit similar to what you've done in #6144 but extends it to checks of services to fix #7318. Some integration tests are not working, but the error is not related to this patch (I am trying to fix those unstable tests in #7350 but without success for now).

ShimmerGlass · 2020-02-29T17:04:39Z

LGTM

pierresouchay · 2020-03-06T10:21:33Z

Hello @i0rek ,

Do you think you could have a look ?

The patch is not that complex, would fix #7318 (thus a regression fix) and is really impacting large clusters (many notifiations when a reload is done on services having many instances and causing lots of instability on the service...

hanshasselberg

Good catch! Thanks for your work!

hanshasselberg · 2020-03-09T11:58:52Z

agent/agent.go

@@ -478,7 +478,7 @@ func (a *Agent) Start() error {
 	a.serviceManager.Start()

 	// Load checks/services/metadata.
-	if err := a.loadServices(c); err != nil {
+	if err := a.loadServices(c, nil); err != nil {


The reason why we don't need to pass in a snapshot is that the agent just started and it doesn't have any information in memory about anything pretty much.

…7345) This fixes issue #7318 Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks of services. What happened when a reload was: 1. save all healthcheck statuses 2. cleanup everything 3. add new services with healthchecks In step 3, the state of healthchecks was taken into account locally, so at step 3, but since we cleaned up at step 2, state was lost. This PR introduces the snap parameter, so step 3 can use information from step 1

lvets · 2020-06-05T16:35:12Z

@pierresouchay Quick thank you for your swift follow up! :)

…oad doesn't revert check state to critical (#8747) Likely introduced when #7345 landed.

pierresouchay mentioned this pull request Feb 25, 2020

Consul 1.5.3 changes check status behavior when doing a consul reload? #7318

Closed

pierresouchay added 2 commits February 25, 2020 17:22

Added unit test to ensure checks statuses are kept between reloads

2f0d20c

Fixed unit test

a0339e6

hanshasselberg approved these changes Mar 9, 2020

View reviewed changes

hanshasselberg merged commit 864f7ef into hashicorp:master Mar 9, 2020

rboyer mentioned this pull request Mar 10, 2020

flaky test: TestAgent_ReloadConfigAndKeepChecksStatus #7425

Closed

pierresouchay mentioned this pull request Mar 13, 2020

Watch getting triggered in consul reload #7446

Closed

gsolic mentioned this pull request Mar 23, 2020

Checks transition to critical state during reload #6914

Closed

rboyer mentioned this pull request Sep 24, 2020

agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical #8747

Merged

rboyer added a commit that referenced this pull request Sep 24, 2020

agent: when enable_central_service_config is enabled ensure agent rel…

7eef25d

…oad doesn't revert check state to critical (#8747) Likely introduced when #7345 landed.

hashicorp-ci pushed a commit that referenced this pull request Sep 24, 2020

agent: when enable_central_service_config is enabled ensure agent rel…

e05c30d

…oad doesn't revert check state to critical (#8747) Likely introduced when #7345 landed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

pierresouchay commented Feb 25, 2020

pierresouchay commented Feb 28, 2020

ShimmerGlass commented Feb 29, 2020

pierresouchay commented Mar 6, 2020

hanshasselberg left a comment

hanshasselberg Mar 9, 2020

lvets commented Jun 5, 2020

[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

Conversation

pierresouchay commented Feb 25, 2020

pierresouchay commented Feb 28, 2020

ShimmerGlass commented Feb 29, 2020

pierresouchay commented Mar 6, 2020

hanshasselberg left a comment

Choose a reason for hiding this comment

hanshasselberg Mar 9, 2020

Choose a reason for hiding this comment

lvets commented Jun 5, 2020