Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The consul.catalog.nodes_up should reflect the # of nodes actually up #2018

Closed

Conversation

mtougeron
Copy link
Contributor

The consul.catalog.nodes_up should reflect the # of nodes actually up so it is now checking the serfHealth for each service node. Technically, I think the consul.catalog.nodes_up should reflect the # of nodes passing the health checks but for backwards compatibility I left it with just the nodes that are running that are part of the service.

I also added 3 new stats: consul.catalog.nodes_passing, consul.catalog.nodes_warning, & consul.catalog.nodes_critical so that you can trend the actual # of nodes in each state for the service.

Future enhancement planned: Adjust the service checks to use the /v1/health/service/<service> endpoint instead of /v1/health/state/<state> since /v1/health/state/<state> only returns results for services with a Check defined. This means that services that are based on if the node is up/down won't have proper checks from the agent.

Mike Tougeron added 2 commits October 30, 2015 11:30
… so it is now checking the serfHealth for each service node. Also track the actual status of the nodes in the service.
@irabinovitch
Copy link
Contributor

Hi Mike, Thanks for the continued PRs on the agent! It looks like this PR might have caused the mock tests to start failing for this check.

test_get_nodes_with_service (tests.checks.mock.test_consul.TestCheckConsul) ... Exception 'str' object has no attribute 'get' during check Traceback (most recent call last): File "/home/travis/build/DataDog/dd-agent/tests/checks/common.py", line 164, in run_check self.check.check(copy.deepcopy(instance)) File "/home/travis/build/DataDog/dd-agent/checks.d/consul.py", line 264, in check node_id = node.get('Node') or None AttributeError: 'str' object has no attribute 'get'

I think the API key failure maybe a flakey test. @remh thoughts?

@mtougeron
Copy link
Contributor Author

Actually, I think the problem with this one is that I didn't update the mock. Fixing that now.

@mtougeron
Copy link
Contributor Author

@irabinovitch okay, now I think it is due to the problem with the unit tests. :/

@mtougeron
Copy link
Contributor Author

@irabinovitch Can you help me out here? It looks like the unit tests are failing for something totally unrelated to my changes.,

@olivielpeau
Copy link
Member

Thanks for looking into this @mtougeron, the tests that were failing were flaky, I've restarted them and they pass now.

@irabinovitch
Copy link
Contributor

Thanks @olivielpeau @mtougeron. Apologies for the delay in responding, been on the road a bit.

@mtougeron
Copy link
Contributor Author

Ya! Thanks! Now to get it merged. :)

@remh
Copy link

remh commented Nov 19, 2015

@talwai can you look at this one ?

@remh remh added this to the 5.7.0 milestone Nov 19, 2015
@talwai
Copy link
Contributor

talwai commented Dec 4, 2015

Looks good @mtougeron , thanks once again. I'm going to add some cosmetic changes on top of your work and we can merge it in for our next bugfix release

@mtougeron
Copy link
Contributor Author

Cool, sounds good! Thanks.

@mtougeron
Copy link
Contributor Author

These changes were merged via #2130

@mtougeron mtougeron closed this Dec 7, 2015
@mtougeron mtougeron deleted the feature-consul-service-status branch December 7, 2015 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants