You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is the job still needed?
Maybe. In this Job, the status of the service is being reported with the key: api.external_service.#{service.name}.up. But in Datadog, we access the status of an external service via vets_api.statsd.api_external_service.up. So it seems like we don't use it anymore. I don't see any metrics starting with api in Datadog. We have this config in statsd-exporter-mapping.conf in the devops repo but... not sure if that's in use (file hasn't been updated in 3 years).
Does the job need exception handling added?
Yes.
Any other modifications needed?
The line StatsD.gauge("api.external_service.#{service.name}.up", up) in the job might need to be updated to the newer, vets_api.statsd....
Add a comment to ExportBreakerStatus line in periodic_jobs.rb to document what it does ("Send breaker status to statsd").
Add sidekiq_options(retry: false). This job runs every minute, so it doesn't make sense to have it retry. (This job doesn't specify any retries meaning it hits the sidekiq retry default of 25.)
User Story
As a vets-api owner, I want to understand the purpose and functionality of the ExportBreakerStatus job, so that it can be refactored to add error handling and make it useful.
Issue Description
The ExportBreakerStatus Sidekiq job runs every minute via cron and reports an external service's status ("up" or "down") to StatsD under the key api.external_service.<service_name>.up. However, after investigation (detailed above and in this Slack thread), it is unclear if this job is still used:
Metrics starting with api are not visible in Datadog.
Current external service statuses are accessed in Datadog using vets_api.statsd.api_external_service.up, which suggests a different mechanism is in use.
Additionally, the job lacks exception handling and retries are enabled by default, which is unnecessary for a job running every minute.
Tasks
Investigate whether metrics from ExportBreakerStatus (api.external_service.<service_name>.up) are being forwarded to StatsD and Datadog or are being dropped.
Determine if the statsd-exporter-mapping.conf file is still in use and relevant for metric key translation.
Confirm whether vets_api.statsd.api_external_service.up metrics are being generated elsewhere or if this job needs to be updated to use this key format.
Add exception handling to the job.
Update the job configuration to add sidekiq_options retry: false.
Add a comment to the ExportBreakerStatus line in periodic_jobs.rb to document its purpose.
Based on findings:
Update the job to use vets_api.statsd... if necessary.
If the job is redundant, deprecate it.
Acceptance Criteria
The connection between Breakers, this job, and external service monitoring metrics is understood and validated.
If the job is still relevant, metrics from it are being forwarded to Datadog and monitored effectively.
If the job is redundant, it is safely deprecated, and its removal does not impact existing monitoring functionality.
Exception handling is added to the job.
Retries are disabled for the job.
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered:
Refinement note: this feels big and could be split into two tickets. One for discovery and one for the work. Or not. 😄
Initial investigation
What does the job do?
This runs every minute. It reports a status of "up" or "down" to statsd. Originally added in department-of-veterans-affairs/vets-api#1358.
Is the job still needed?
Maybe. In this Job, the status of the service is being reported with the key:
api.external_service.#{service.name}.up
. But in Datadog, we access the status of an external service viavets_api.statsd.api_external_service.up
. So it seems like we don't use it anymore. I don't see any metrics starting withapi
in Datadog. We have this config in statsd-exporter-mapping.conf in the devops repo but... not sure if that's in use (file hasn't been updated in 3 years).Does the job need exception handling added?
Yes.
Any other modifications needed?
StatsD.gauge("api.external_service.#{service.name}.up", up)
in the job might need to be updated to the newer,vets_api.statsd...
.ExportBreakerStatus
line in periodic_jobs.rb to document what it does ("Send breaker status to statsd").sidekiq_options(retry: false)
. This job runs every minute, so it doesn't make sense to have it retry. (This job doesn't specify any retries meaning it hits the sidekiq retry default of 25.)User Story
As a vets-api owner, I want to understand the purpose and functionality of the
ExportBreakerStatus
job, so that it can be refactored to add error handling and make it useful.Issue Description
The
ExportBreakerStatus
Sidekiq job runs every minute via cron and reports an external service's status ("up" or "down") to StatsD under the keyapi.external_service.<service_name>.up
. However, after investigation (detailed above and in this Slack thread), it is unclear if this job is still used:api
are not visible in Datadog.vets_api.statsd.api_external_service.up
, which suggests a different mechanism is in use.api.external_service
, but is that file still relevant?This raises several questions:
vets_api.statsd.api_external_service.up
?Additionally, the job lacks exception handling and retries are enabled by default, which is unnecessary for a job running every minute.
Tasks
ExportBreakerStatus
(api.external_service.<service_name>.up
) are being forwarded to StatsD and Datadog or are being dropped.statsd-exporter-mapping.conf
file is still in use and relevant for metric key translation.vets_api.statsd.api_external_service.up
metrics are being generated elsewhere or if this job needs to be updated to use this key format.sidekiq_options retry: false
.ExportBreakerStatus
line inperiodic_jobs.rb
to document its purpose.vets_api.statsd...
if necessary.Acceptance Criteria
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered: