-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graceful handling of malformed persisted service/check files. #3821
Conversation
Previously a change was made to make the file writing atomic, but that wasn't enough to cover something like an OS crash so we needed something here to handle the situation more gracefully. Fixes #1221.
agent/agent.go
Outdated
// it for convenience and log about it. | ||
if len(buf) == 0 { | ||
a.logger.Printf("[WARN] Removing leftover empty service file %q", file) | ||
if err := os.Remove(file); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should attempt to delete the file, just warn and skip it. Leaving it around might help operators see what's going on by looking at timestamps, etc.
agent/agent.go
Outdated
// Decode the check | ||
var p persistedCheck | ||
if err := json.Unmarshal(buf, &p); err != nil { | ||
return fmt.Errorf("Failed decoding check file %q: %s", file, err) | ||
a.logger.Printf("[WARN] Failed decoding check file %q: %s", file, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was torn on this one, but since these are internal to Consul then we ought to skip these if we skip empty files, since OS crashes could cause this, too. I'd change all the logging in here to ERR
level though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have a question regarding this change. I could see from the diff that we're ignoring the check and service files, if they're malformed. In that case when consul starts, how does it know about the ignored checks and services? Is there an expectation that the services & checks should be registered with consul again? |
Previously a change was made to make the file writing atomic,
but that wasn't enough to cover something like an OS crash so we
needed something here to handle the situation more gracefully.
Fixes #1221.