Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry mechanism for Azure service bus #6276

Conversation

svedbod
Copy link

@svedbod svedbod commented Nov 25, 2024

Summary

This PR refactors the retry logic in AbstractAzureSiriUpdater into a reusable method, executeWithRetry, which is now applied consistently for retries. The updated method includes handling of retriable and non-retriable exceptions to ensure retries occur only when appropriate. Accompanying unit tests validate the behavior and correctness of this refactored retry mechanism.

Issue

Motivation:

  • The previous implementation had retry logic specific to historical data initialization, which was not reusable or consistently applied to other processes like starting the Azure Service Bus event processor.
  • Failures in these critical processes due to transient issues could disrupt message processing and require manual intervention.

How the Code Works:

  • executeWithRetry encapsulates retry logic with an exponential backoff strategy, capping retries at a maximum interval of 60 seconds.
  • It integrates a shouldRetry method to differentiate between retriable and non-retriable exceptions, improving robustness and avoiding unnecessary retries.
  • This approach ensures consistent error handling across all relevant operations.

Technical Considerations:

  • Refactoring into a reusable method improves maintainability, consistency, and testability.
  • The retry mechanism enhances fault tolerance by handling transient failures gracefully.
  • Logging at each retry attempt provides better observability and debugging insights.

This PR is linked to Skånetrafiken issue #90238

Unit Tests

  • Unit tests in AbstractAzureSiriUpdaterTest validate:

    1. Correct execution of the retry logic across different scenarios.
    2. Proper handling of backoff intervals, including sustained retries at the maximum interval (60 seconds).
    3. Handling of retriable and non-retriable exceptions using the shouldRetry method.
    4. Mocking of sleep to simulate retry behavior without introducing delays during testing.
  • Manual verification confirmed the retry mechanism works in practical scenarios, though it is challenging to simulate all possible exceptions live.

  • All unit tests pass.

Documentation

  • Code-level documentation explains the design and purpose of executeWithRetry and its reliance on shouldRetry for exception classification.
  • Javadoc has been added for the reusable method and its integrations.
  • No updates to configuration documentation were necessary as no new configuration options were introduced.

Changelog

Added:

  • Refactored retry logic in AbstractAzureSiriUpdater into a reusable method (executeWithRetry) for consistent application across critical operations.
  • Enhanced exception handling with a shouldRetry method to distinguish retriable and non-retriable failures.
  • Unit tests in AbstractAzureSiriUpdaterTest to validate the retry logic and backoff behavior.

Bumping the Serialization Version ID

No changes were made to the graph serialization format; therefore, no serialization version bump is necessary.

@leonardehrenfried leonardehrenfried changed the title added retry mechanism for Azure service bus Added retry mechanism for Azure service bus Nov 26, 2024
@leonardehrenfried leonardehrenfried changed the title Added retry mechanism for Azure service bus Add retry mechanism for Azure service bus Nov 26, 2024
Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 37.71930% with 71 lines in your changes missing coverage. Please review.

Project coverage is 69.78%. Comparing base (1477884) to head (718b738).
Report is 162 commits behind head on dev-2.x.

Files with missing lines Patch % Lines
...t/siri/updater/azure/AbstractAzureSiriUpdater.java 36.60% 68 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #6276      +/-   ##
=============================================
+ Coverage      69.74%   69.78%   +0.03%     
- Complexity     17729    17760      +31     
=============================================
  Files           2010     2011       +1     
  Lines          75918    76015      +97     
  Branches        7775     7791      +16     
=============================================
+ Hits           52948    53044      +96     
- Misses         20260    20262       +2     
+ Partials        2710     2709       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vpaturet
Copy link
Contributor

As mentioned during the developer meeting, you can have a look to org.opentripplanner.framework.retry.OtpRetry that covers at least part of the required features for retry.

Copy link
Member

@leonardehrenfried leonardehrenfried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-sandbox changes look fine.

@habrahamsson-skanetrafiken habrahamsson-skanetrafiken merged commit 51f004c into opentripplanner:dev-2.x Dec 10, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants