Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch error during maintenance window request #19482

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rmtolmach
Copy link
Contributor

Summary

  • This work is behind a feature toggle (flipper): NO
  • (Summarize the changes that have been made to the platform)
    • This code is called by the Sidekiq job, PollMaintenanceWindows which runs every 3 minutes in each env. I'm adding some error handling so it fails loudly. The PagerDuty API documentation says a 400 is returned if invalid arguments are sent, which happens if a service is deleted from pd but not from the list of service IDs in the manifest.
  • (If bug, how to reproduce)
    • See Testing done section below
  • (What is the solution, why is this the solution?)
    • Catch the error in maintenance_client. Part 2 of this work is to make a Datadog Log monitor to catch this error and alert on it. The service IDs are sent as an array to pd. Originally I thought I could modify the code to ignore bad service IDs, but since they're sent as a batch, this isn't possible. We could send each one individually, but that would be a lot of API calls.
  • (Which team do you work for, does your team own the maintenance of this component?)
    • Platform Product team

Related issue(s)

Testing done

  • New code is covered by unit tests

  • Describe what the old behavior was prior to the change
    The Sidekiq job, PagerDuty::PollMaintenanceWindows.new.perform would fail silently if a bad PagerDuty service ID was passed in.

  • Describe the steps required to verify your changes are working as expected. Exclusively stating 'Specs run' is NOT acceptable as appropriate testing

    • replace pagerduty_api_token in config/settings.yml with the real deal from Parameter Store
    • in the rails console, run PagerDuty::PollMaintenanceWindows.new.perform
    • You should see the "Invalid arguments sent to PagerDuty..." error because at least one of the service IDs in settings.yml is invalid. If you comment out all of the services except the first (carma) and re-start the rails console, and run the job again, it will pass (return => [] and no error)

bad id

irb(main):001> PagerDuty::PollMaintenanceWindows.new.perform
2024-11-15 13:31:31.872560 E [43129:11540 maintenance_client.rb:46] Rails -- Invalid arguments sent to PagerDuty. One of the following Service IDs: ["P6XLE0T", "PXXXXXX"] is bad.
2024-11-15 13:31:31.907429 D [43129:11540 log_subscriber.rb:168] ActiveRecord::Base --   MaintenanceWindow Load (33.1ms)  SELECT "maintenance_windows".* FROM "maintenance_windows" WHERE (end_time > '2024-11-15 18:31:31.873516')
=> []

valid id

irb(main):001> PagerDuty::PollMaintenanceWindows.new.perform
2024-11-15 13:29:25.621566 D [34387:11540 log_subscriber.rb:168] ActiveRecord::Base --   MaintenanceWindow Load (29.0ms)  SELECT "maintenance_windows".* FROM "maintenance_windows" WHERE (end_time > '2024-11-15 18:29:25.589456')
=> []

Screenshots

Note: Optional

What areas of the site does it impact?

Vets-api maintenance windows.

Acceptance criteria

  • I fixed|updated|added unit tests and integration tests for each feature (if applicable).
  • No error nor warning in the console.
  • Events are being sent to the appropriate logging solution
  • No sensitive information (i.e. PII/credentials/internal URLs/etc.) is captured in logging, hardcoded, or specs
  • Feature/bug has a monitor built into Datadog (if applicable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants