-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mute consul debug messages #567
Conversation
@@ -336,3 +338,9 @@ func (c *ConsulService) makeCheck(service *structs.Service, check *structs.Servi | |||
} | |||
return cr | |||
} | |||
|
|||
func (c *ConsulService) printLogMessage(message string, v ...interface{}) { | |||
if _, ok := c.node.Attributes["consul.version"]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird that we're suppressing log messages when Consul is not present, but we're still running through all of the register / deregister logic. I feel like we should have shortcut or no-op'd much earlier. Can we skip over the consul logic completely if Consul is not running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbednarski If a user has defined a service block in his/her Task, then even if the Consul agent is not running we start tracking the service and try to keep syncing it with Consul. The current logic allows consul agent to be unavailable but when it comes back up again, it syncs all the service and check definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't follow. There are two scenarios:
- Consul failed temporarily. Let's suppose consul initially is missing but will show up. Nomad starts first. Then Consul starts 20 seconds later. The second fingerprinter pass will enable the consul functionality. From here on any failure to sync a service with Consul is likely an operational problem, so we should show the logs.
- Consul is not running and never will be. In this case the fingerprinter detects Consul is not there, and in fact Consul never shows up. Why would we ever call
NewConsulService
at all, or why would it not immediately no-op? Showing an info log saying we're registering a service when we in fact will never register the service is spurious, and the entire feature seems extraneous.
If we have to have the service registered permanently when the task starts just in case consul shows up some day, it seems like we should still be able to do this check much earlier and skip all of this additional complexity with respect to filtering the logs and retrying something that's never going to succeed.
For example in performSync
if we can't connect to consul we can just skip the entire run, log an ERR
message that we can't connect, and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we just eat all the logs it solves this problem? Do you agree @cbednarski. Because then it just becomes us building the internal state such that if Consul comes back we can do the diff and register/deregister the necessary services and we'd never tell the user we are taking an action against Consul when we aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with suppressing the logs is Nomad is actually doing a lot of work that suddenly becomes invisible, but we probably shouldn't be doing the work in the first place. Hiding the logs is a smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @cbednarski on this one. I think we should skip the entire register loop if Consul was fingerprinted as missing. It doesn't make much sense otherwise to loop over every task, service, and check, and attempt a REST call we know will fail anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanuber Discussed more about this internally. The fingerprinter currently updates every 15s, we would continue to do precise registration and de-registration whenever a new service comes up or goes away. We would not sync with Consul every 5s if we can't get the list of services and checks.
LGTM |
@@ -12,15 +12,21 @@ import ( | |||
"github.com/hashicorp/nomad/nomad/structs" | |||
) | |||
|
|||
const ( | |||
consulAvailable = "consulavailable" | |||
consulUnavailable = "consulunavailable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just change these to available
and unavailable
? That way you could just compare against the last state, and log a message saying something like consul status changed to %s
.
@@ -12,16 +12,21 @@ import ( | |||
"github.com/hashicorp/nomad/nomad/structs" | |||
) | |||
|
|||
const ( | |||
available = "available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be consulAvailable/consulUnavailable as they are constants available throughout the fingerprint package.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This mutes the consul related debug messages when the Consul Agent is not available.