Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Configuration reload does not discard Check's statuses for services #7345

Conversation

pierresouchay
Copy link
Contributor

This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

  1. save all healthcheck statuses
  2. cleanup everything
  3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1

…ervices

This fixes issue hashicorp#7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch hashicorp#6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

1. save all healthcheck statuses
2. cleanup everything
3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1
@pierresouchay
Copy link
Contributor Author

@rboyer This patch is a bit similar to what you've done in #6144 but extends it to checks of services to fix #7318. Some integration tests are not working, but the error is not related to this patch (I am trying to fix those unstable tests in #7350 but without success for now).

@ShimmerGlass
Copy link
Contributor

LGTM

@pierresouchay
Copy link
Contributor Author

Hello @i0rek ,

Do you think you could have a look ?

The patch is not that complex, would fix #7318 (thus a regression fix) and is really impacting large clusters (many notifiations when a reload is done on services having many instances and causing lots of instability on the service...

Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for your work!

@@ -478,7 +478,7 @@ func (a *Agent) Start() error {
a.serviceManager.Start()

// Load checks/services/metadata.
if err := a.loadServices(c); err != nil {
if err := a.loadServices(c, nil); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we don't need to pass in a snapshot is that the agent just started and it doesn't have any information in memory about anything pretty much.

@hanshasselberg hanshasselberg merged commit 864f7ef into hashicorp:master Mar 9, 2020
freddygv pushed a commit that referenced this pull request Mar 12, 2020
…7345)

This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

1. save all healthcheck statuses
2. cleanup everything
3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1
@lvets
Copy link

lvets commented Jun 5, 2020

@pierresouchay Quick thank you for your swift follow up! :)

rboyer added a commit that referenced this pull request Sep 24, 2020
…oad doesn't revert check state to critical (#8747)

Likely introduced when #7345 landed.
hashicorp-ci pushed a commit that referenced this pull request Sep 24, 2020
…oad doesn't revert check state to critical (#8747)

Likely introduced when #7345 landed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants