-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display more information about check being not properly added when it fails #4405
Display more information about check being not properly added when it fails #4405
Conversation
@pearkes It would really help to see from Consul server on which agent the error is coming from. Instead of displaying for instance This kind of details make very hard to operate Consul on large scale for now, and this kind of tips would really help investigation the issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @pierresouchay what do you think about the name?
agent/consul/state/catalog.go
Outdated
@@ -266,6 +266,17 @@ func (s *Store) EnsureRegistration(idx uint64, req *structs.RegisterRequest) err | |||
return nil | |||
} | |||
|
|||
func (s *Store) ensureCheckIsCorrect(tx *memdb.Txn, idx uint64, node string, check *structs.HealthCheck) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpicky but this name I find a bit misleading "ensureCheckIsCorrect" sounds like it would only perform validation but it actually modifies the state and updates/inserts the check if it's OK.
Names are hard, and at risk of sounding like a Java class, something like ensureCheckIfNodeMatches
would at least avoid that issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@banks DONE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@banks For the name, it is a bit complex because: it is not mandatory as far as I know, the registration of check might fail at Service Level, Check Level or Node Level (because all of them could be used).
When dealing with service registration, we probably would like to know that the service registration indeed failed (because the ACL indeded was failing for the check), for node level, that's something else. I actually do not know how to deal with all those cases properly for now, thus I did not add the check name. But you are right, it might be event better, if you have an idea on how dealing with all of this at this level, I am listening to suggestions :-)
Having the node source is already a good step as it might help administrator identify where the request is coming from, but this is probably just a first step
@banks Ok, I will update the name shortly |
… fails It follows an incident where we add lots of error messages: [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration That seems related to Consul failing to restart on respective agents. Having Node information as well as service information would help diagnose the issue.
92147f8
to
cb05557
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
It follows an incident where we add lots of error messages:
[WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration
That seems related to Consul failing to restart on respective agents.
Having Node information as well as service information would help diagnose the issue.