-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466
[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466
Conversation
…load provider Documents submitted via our Lighthouse API client that fail to return a success response raise custom exceptions defined in the Lighthouse::ServiceException class. As such, the previous strategy of capturing and logging the Lighthouse API response directly in LighthouseSupplementalDocumentUploadProvider will never work - these exceptions will be raised by our exisitng Lighthouse API client before we have a chance to inspect the response here. Updates the provider to rescue from any one of the possible API exceptions raised in this case, properly log the upload as a failure, and re-raise the exception to maintain the current behavior
expect(StatsD).to receive(:increment).with( | ||
'my_stats_metric_prefix.lighthouse_supplemental_document_upload_provider.upload_failure' | ||
) | ||
describe 'service exceptions' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet been able to get all of these tests passing with the Rails logger logging, some work and some do not which is confusing since they are all just custom exception classes. The metrics increment and re-raising the error behavior does seem to work for all of them.
We may not want to use the Rails logger for capturing exceptions, as this information is already logged elsewhere (you can see the custom exceptions showing up in the "catch all" widget on our migration dashboard such as this one)
The purpose of logging the exception here would just be to have a unified system of logging events in the upload providers, as this logging matches how we log attempts and successes, in addition to the metrics, which are more helpful for aggregating data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option to just get this working is to just increment the metric and not worry about a redundant call to the rails logger
…e-response-metrics
log_upload_failure(e) | ||
raise e | ||
end | ||
|
||
handle_lighthouse_response(api_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename hand success response
Closing this as we're going to take a slightly different approach and I'd rather just start a clean branch |
WORKING PR: I am out for a week and wasn't able to reliably accomplish the level of test coverage I was hoping for, so I am handing this off to @ajones446 in case we end up completing it this week. It is lower priority than some of the other todo items remaining in the migration
Summary
No, errors for the upload jobs that are turned on for Lighthouse in production are not getting logged the way we want them to for our team's internal stats keeping. However, there are dashboards in place that log these errors so we are aware of them. They just don't neatly increment the statsd metric we will be using to quickly reference the breakdown of attempts, successes and failures when uploading document to Lighthouse
Documents submitted via our Lighthouse API client that fail to return a success response raise custom exceptions defined in the Lighthouse::ServiceException class. As such, the previous strategy of capturing and logging the Lighthouse API response directly in LighthouseSupplementalDocumentUploadProvider will never work - these exceptions will be raised by our exisitng Lighthouse API client before we have a chance to inspect the response here.
Updates the provider to rescue from any one of the possible API exceptions raised in this case, properly log the upload as a failure, and re-raise the exception to maintain the current behavior
Disability benefits team 2, we own the maintenace
Related issue(s)
Testing done
NOTE: tests are not yet passing, further work is required
Acceptance criteria
Requested Feedback
This was just the first approach I thought of for this problem, its not ideal we have layers and layers of redundant logging and exception handling across our API client codebase, but this is cleanest way I can think of to increment our failure metrics for the Lighthouse migration based on the custom client exceptions paradigm defined in this exception handler class
My main reservations about the approach I took are:
Needing to maintain an array of exception classes in
LIGHTHOUSE_RESPONSE_EXCEPTION_CLASSES
that matches the potential exceptions raised by our service exception code. I don't think we should raise and log any exception here, as this metric is meant to indicate a failure response from Lighthouse explicitly. So we may not have much choice, it just feels dirty/the provider knows too much about the exception classesThe associated testing approach which loops through these exceptions and uses RSpec's shared examples. I hate the shared example DSL, it's confusing to read and kind of clunky. But it probably makes sense here given we have to test explicit exception handling for specific exceptions and there is a whole list of them.