Initialize facilities job error metric #3398

rtravitz · 2019-10-07T18:29:05Z

Description of change

This PR helps address #3016

Facilities alerts currently fire indefinitely until vets-api-worker instance redeploy in Prometheus after a metric increments in StatsD. A PR exists to address this in the devops repo by only alerting when the failure counter for the most recent five minute window is greater than the failure counter for the most prior five minute window. However, that doesn't capture when the metric goes from not existing to existing the first time a failure occurs. To solve that case, we can either do some contortions in promQL to check for the absence of the metric, or we can initialize it to zero in vets-api. This PR does the latter.

Testing done

Checked that the Facilities_.* error metric is incremented to zero in StatsD when Sidekiq starts up

Acceptance Criteria (Definition of Done)

Unique to this PR

StatsD error metric is initialized to zero

Applies to all PRs

Appropriate logging
Swagger docs have been updated, if applicable
Provide link to originating GitHub issue, or connected to it via ZenHub
Does not contain any sensitive information (i.e. PII/credentials/internal URLs/etc., in logging, hardcoded, or in specs)
Provide which alerts would indicate a problem with this functionality (if applicable)

Initialize facilities job error metric

973fda5

rtravitz requested review from a team as code owners October 7, 2019 18:29

Fix linting error

6d0b99e

rtravitz requested review from omgitsbillryan, LindseySaari, kfrz, nfasulo, MrBilnon and edmkitty October 7, 2019 18:33

nfasulo approved these changes Oct 7, 2019

View reviewed changes

va-vfs-bot had a problem deploying to rt/prometheus-facilities-alerts/master October 7, 2019 18:42 Error

Merge branch 'master' into rt/prometheus-facilities-alerts

7e59ec8

va-vfs-bot temporarily deployed to rt/prometheus-facilities-alerts/master October 7, 2019 19:46 Inactive

Merge branch 'master' into rt/prometheus-facilities-alerts

48006d7

rtravitz merged commit 0f790f8 into master Oct 7, 2019

va-vfs-bot deployed to rt/prometheus-facilities-alerts/master October 7, 2019 20:13 View deployment

rtravitz deleted the rt/prometheus-facilities-alerts branch October 7, 2019 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize facilities job error metric #3398

Initialize facilities job error metric #3398

rtravitz commented Oct 7, 2019

Initialize facilities job error metric #3398

Initialize facilities job error metric #3398

Conversation

rtravitz commented Oct 7, 2019

Description of change

Testing done

Acceptance Criteria (Definition of Done)

Unique to this PR

Applies to all PRs