Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display more information about check being not properly added when it fails #4405

Merged

Conversation

pierresouchay
Copy link
Contributor

It follows an incident where we add lots of error messages:

[WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

@pierresouchay
Copy link
Contributor Author

@pearkes It would really help to see from Consul server on which agent the error is coming from.

Instead of displaying for instance failed inserting check: ACL not found, you would have log such as :
failed inserting check: ACL not found on node my-node-001 => which really help diagnosing the issues on the servers.

This kind of details make very hard to operate Consul on large scale for now, and this kind of tips would really help investigation the issues

@pierresouchay
Copy link
Contributor Author

@mkeeler @banks Anyone for a review on this simple PR?

Would just help a bit Consul administrators solving ACLs issues faster

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @pierresouchay what do you think about the name?

@@ -266,6 +266,17 @@ func (s *Store) EnsureRegistration(idx uint64, req *structs.RegisterRequest) err
return nil
}

func (s *Store) ensureCheckIsCorrect(tx *memdb.Txn, idx uint64, node string, check *structs.HealthCheck) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nitpicky but this name I find a bit misleading "ensureCheckIsCorrect" sounds like it would only perform validation but it actually modifies the state and updates/inserts the check if it's OK.

Names are hard, and at risk of sounding like a Java class, something like ensureCheckIfNodeMatches would at least avoid that issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks DONE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks For the name, it is a bit complex because: it is not mandatory as far as I know, the registration of check might fail at Service Level, Check Level or Node Level (because all of them could be used).

When dealing with service registration, we probably would like to know that the service registration indeed failed (because the ACL indeded was failing for the check), for node level, that's something else. I actually do not know how to deal with all those cases properly for now, thus I did not add the check name. But you are right, it might be event better, if you have an idea on how dealing with all of this at this level, I am listening to suggestions :-)

Having the node source is already a good step as it might help administrator identify where the request is coming from, but this is probably just a first step

@pierresouchay
Copy link
Contributor Author

@banks Ok, I will update the name shortly

… fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.
@pierresouchay pierresouchay force-pushed the more_info_on_check_registration_failure branch from 92147f8 to cb05557 Compare August 14, 2018 16:25
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@banks banks merged commit 0d6de25 into hashicorp:master Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants