Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /health page #76

Closed
garethrees opened this issue May 9, 2018 · 4 comments
Closed

Add /health page #76

garethrees opened this issue May 9, 2018 · 4 comments
Assignees
Milestone

Comments

@garethrees
Copy link
Member

garethrees commented May 9, 2018

Consider signs that the app is having issues.

I expect we want to check that:

  • There are no delivered submissions with empty references
  • There are no submissions that have been unqueued for ages
  • There are no submissions that have been queued for ages
  • There's no old data that should have been cleaned up by the to-be-written cleanup task
  • There are no jobs in the Sidekiq "Dead Jobs" queue

We can then add this to the test_urls vhost config.

@sagepe
Copy link
Member

sagepe commented May 11, 2018

These data could also be exposed via the Prometheus endpoint (#56), although an aggregated set of data from across the service would be useful for service-wide metrics. If you do expose service-wide metrics at /health consider doing so using Prometheus's data format (there would be nothing to stop us parsing this inside a Nagios check, too).

@garethrees
Copy link
Member Author

Example with Prometheus formatting https://api.mapumental.com/health/metrics

@sagepe
Copy link
Member

sagepe commented May 11, 2018

@garethrees
Copy link
Member Author

Also want a check for "old" queued submissions with no reference where there are no jobs scheduled (or retry jobs scheduled) to guard against #89.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants