[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466

NB28VT · 2024-11-14T20:12:36Z

WORKING PR: I am out for a week and wasn't able to reliably accomplish the level of test coverage I was hoping for, so I am handing this off to @ajones446 in case we end up completing it this week. It is lower priority than some of the other todo items remaining in the migration

Summary

*This work is behind a feature toggle (flipper):

No, errors for the upload jobs that are turned on for Lighthouse in production are not getting logged the way we want them to for our team's internal stats keeping. However, there are dashboards in place that log these errors so we are aware of them. They just don't neatly increment the statsd metric we will be using to quickly reference the breakdown of attempts, successes and failures when uploading document to Lighthouse

(Summarize the changes that have been made to the platform)

Documents submitted via our Lighthouse API client that fail to return a success response raise custom exceptions defined in the Lighthouse::ServiceException class. As such, the previous strategy of capturing and logging the Lighthouse API response directly in LighthouseSupplementalDocumentUploadProvider will never work - these exceptions will be raised by our exisitng Lighthouse API client before we have a chance to inspect the response here.

(What is the solution, why is this the solution?)

Updates the provider to rescue from any one of the possible API exceptions raised in this case, properly log the upload as a failure, and re-raise the exception to maintain the current behavior

(Which team do you work for, does your team own the maintenance of this component?)

Disability benefits team 2, we own the maintenace

Related issue(s)

More information on the issue is described in this ticket

Testing done

NOTE: tests are not yet passing, further work is required

New code is covered by unit tests
Describe what the old behavior was prior to the change
Describe the steps required to verify your changes are working as expected. Exclusively stating 'Specs run' is NOT acceptable as appropriate testing
If this work is behind a flipper:
- Tests need to be written for both the flipper on and flipper off scenarios. Docs.
- What is the testing plan for rolling out the feature?

Acceptance criteria

I fixed|updated|added unit tests and integration tests for each feature (if applicable).
No error nor warning in the console.
Events are being sent to the appropriate logging solution
Documentation has been updated (link to documentation)
No sensitive information (i.e. PII/credentials/internal URLs/etc.) is captured in logging, hardcoded, or specs
Feature/bug has a monitor built into Datadog (if applicable)
If app impacted requires authentication, did you login to a local build and verify all authenticated routes work as expected
I added a screenshot of the developed feature

Requested Feedback

This was just the first approach I thought of for this problem, its not ideal we have layers and layers of redundant logging and exception handling across our API client codebase, but this is cleanest way I can think of to increment our failure metrics for the Lighthouse migration based on the custom client exceptions paradigm defined in this exception handler class

My main reservations about the approach I took are:

Needing to maintain an array of exception classes in LIGHTHOUSE_RESPONSE_EXCEPTION_CLASSES that matches the potential exceptions raised by our service exception code. I don't think we should raise and log any exception here, as this metric is meant to indicate a failure response from Lighthouse explicitly. So we may not have much choice, it just feels dirty/the provider knows too much about the exception classes
The associated testing approach which loops through these exceptions and uses RSpec's shared examples. I hate the shared example DSL, it's confusing to read and kind of clunky. But it probably makes sense here given we have to test explicit exception handling for specific exceptions and there is a whole list of them.

…load provider Documents submitted via our Lighthouse API client that fail to return a success response raise custom exceptions defined in the Lighthouse::ServiceException class. As such, the previous strategy of capturing and logging the Lighthouse API response directly in LighthouseSupplementalDocumentUploadProvider will never work - these exceptions will be raised by our exisitng Lighthouse API client before we have a chance to inspect the response here. Updates the provider to rescue from any one of the possible API exceptions raised in this case, properly log the upload as a failure, and re-raise the exception to maintain the current behavior

NB28VT · 2024-11-14T20:16:54Z

...pensation/providers/document_upload/lighthouse_supplemental_document_upload_provider_spec.rb

-        expect(StatsD).to receive(:increment).with(
-          'my_stats_metric_prefix.lighthouse_supplemental_document_upload_provider.upload_failure'
-        )
+      describe 'service exceptions' do


I haven't yet been able to get all of these tests passing with the Rails logger logging, some work and some do not which is confusing since they are all just custom exception classes. The metrics increment and re-raising the error behavior does seem to work for all of them.

We may not want to use the Rails logger for capturing exceptions, as this information is already logged elsewhere (you can see the custom exceptions showing up in the "catch all" widget on our migration dashboard such as this one)

The purpose of logging the exception here would just be to have a unified system of logging events in the upload providers, as this logging matches how we log attempts and successes, in addition to the metrics, which are more helpful for aggregating data

One option to just get this working is to just increment the metric and not worry about a redundant call to the rails logger

…e-response-metrics

NB28VT · 2024-11-25T19:17:57Z

...y_compensation/providers/document_upload/lighthouse_supplemental_document_upload_provider.rb

+      log_upload_failure(e)
+      raise e
+    end
+
    handle_lighthouse_response(api_response)


Rename hand success response

NB28VT · 2024-11-25T21:30:01Z

Closing this as we're going to take a slightly different approach and I'd rather just start a clean branch

NB28VT commented Nov 14, 2024

View reviewed changes

va-vfs-bot temporarily deployed to 96928-nb-disability-benefits-fix-lighthouse-response-metrics/main/main November 14, 2024 20:20 Inactive

github-actions bot added the test-failure label Nov 14, 2024

Merge branch 'master' into 96928-nb-disability-benefits-fix-lighthous…

798e608

…e-response-metrics

va-vfs-bot temporarily deployed to 96928-nb-disability-benefits-fix-lighthouse-response-metrics/main/main November 19, 2024 22:53 Inactive

WORKING still trying to figure out approach

a07c629

github-actions bot added the lint-failure label Nov 25, 2024

NB28VT commented Nov 25, 2024

View reviewed changes

NB28VT closed this Nov 25, 2024

va-vfs-bot deployed to 96928-nb-disability-benefits-fix-lighthouse-response-metrics/main/main November 25, 2024 22:36 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466

[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466

NB28VT commented Nov 14, 2024 •

edited

Loading

NB28VT Nov 14, 2024

NB28VT Nov 14, 2024

NB28VT Nov 25, 2024

NB28VT commented Nov 25, 2024

[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466

[LH Doc Upload Migration] Fix Lighthouse Upload Failure Metrics Logging #19466

Conversation

NB28VT commented Nov 14, 2024 • edited Loading

Summary

Related issue(s)

Testing done

Acceptance criteria

Requested Feedback

NB28VT Nov 14, 2024

Choose a reason for hiding this comment

NB28VT Nov 14, 2024

Choose a reason for hiding this comment

NB28VT Nov 25, 2024

Choose a reason for hiding this comment

NB28VT commented Nov 25, 2024

NB28VT commented Nov 14, 2024 •

edited

Loading