Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mute consul debug messages #567

Merged
merged 7 commits into from
Dec 11, 2015
Merged

mute consul debug messages #567

merged 7 commits into from
Dec 11, 2015

Conversation

diptanu
Copy link
Contributor

@diptanu diptanu commented Dec 11, 2015

This mutes the consul related debug messages when the Consul Agent is not available.

@@ -336,3 +338,9 @@ func (c *ConsulService) makeCheck(service *structs.Service, check *structs.Servi
}
return cr
}

func (c *ConsulService) printLogMessage(message string, v ...interface{}) {
if _, ok := c.node.Attributes["consul.version"]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird that we're suppressing log messages when Consul is not present, but we're still running through all of the register / deregister logic. I feel like we should have shortcut or no-op'd much earlier. Can we skip over the consul logic completely if Consul is not running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbednarski If a user has defined a service block in his/her Task, then even if the Consul agent is not running we start tracking the service and try to keep syncing it with Consul. The current logic allows consul agent to be unavailable but when it comes back up again, it syncs all the service and check definitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't follow. There are two scenarios:

  1. Consul failed temporarily. Let's suppose consul initially is missing but will show up. Nomad starts first. Then Consul starts 20 seconds later. The second fingerprinter pass will enable the consul functionality. From here on any failure to sync a service with Consul is likely an operational problem, so we should show the logs.
  2. Consul is not running and never will be. In this case the fingerprinter detects Consul is not there, and in fact Consul never shows up. Why would we ever call NewConsulService at all, or why would it not immediately no-op? Showing an info log saying we're registering a service when we in fact will never register the service is spurious, and the entire feature seems extraneous.

If we have to have the service registered permanently when the task starts just in case consul shows up some day, it seems like we should still be able to do this check much earlier and skip all of this additional complexity with respect to filtering the logs and retrying something that's never going to succeed.

For example in performSync if we can't connect to consul we can just skip the entire run, log an ERR message that we can't connect, and move on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we just eat all the logs it solves this problem? Do you agree @cbednarski. Because then it just becomes us building the internal state such that if Consul comes back we can do the diff and register/deregister the necessary services and we'd never tell the user we are taking an action against Consul when we aren't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with suppressing the logs is Nomad is actually doing a lot of work that suddenly becomes invisible, but we probably shouldn't be doing the work in the first place. Hiding the logs is a smell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cbednarski on this one. I think we should skip the entire register loop if Consul was fingerprinted as missing. It doesn't make much sense otherwise to loop over every task, service, and check, and attempt a REST call we know will fail anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanuber Discussed more about this internally. The fingerprinter currently updates every 15s, we would continue to do precise registration and de-registration whenever a new service comes up or goes away. We would not sync with Consul every 5s if we can't get the list of services and checks.

@dadgar
Copy link
Contributor

dadgar commented Dec 11, 2015

LGTM

@@ -12,15 +12,21 @@ import (
"github.com/hashicorp/nomad/nomad/structs"
)

const (
consulAvailable = "consulavailable"
consulUnavailable = "consulunavailable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change these to available and unavailable? That way you could just compare against the last state, and log a message saying something like consul status changed to %s.

@@ -12,16 +12,21 @@ import (
"github.com/hashicorp/nomad/nomad/structs"
)

const (
available = "available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be consulAvailable/consulUnavailable as they are constants available throughout the fingerprint package.

diptanu added a commit that referenced this pull request Dec 11, 2015
@diptanu diptanu merged commit 32c0177 into master Dec 11, 2015
@diptanu diptanu deleted the r-mute-consul-debug-messages branch December 11, 2015 21:58
@github-actions
Copy link

github-actions bot commented May 1, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants