Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breaker not handling exception at end points even when ExceptionsAllowedBeforeBreaking is set. #1550

Closed
anuviswan opened this issue Jan 6, 2022 · 12 comments · Fixed by #1753
Assignees
Labels
bug Identified as a potential bug merged Issue has been merged to dev and is waiting for the next release QoS Ocelot feature: Quality of Service aka Polly
Milestone

Comments

@anuviswan
Copy link

anuviswan commented Jan 6, 2022

Expected Behavior / New Feature

As per the documentation of QoS functionality of Ocelot, it uses Polly to achieve Circuit Breaker among the other resilience features. From what I could understand, the following configuration should have triggered the circuit breaker after an endpoint raised exception more than 2 times.

"QoSOptions": {
   "ExceptionsAllowedBeforeBreaking": 2,
   "DurationOfBreak": 10000,
   "TimeoutValue": 2000
}

Actual Behavior / Motivation for New Feature

However, it looked like the exception raised by the end points isn't handled by the circuit breaker. Please note that the Timeout works perfectly and triggers the circuit breaker as expected. However the case wasn't same with the ExceptionsAllowedBeforeBreaking.

Steps to Reproduce the Problem

  1. Create a route in the gateway with the above QoS configuration
  2. The specified Endpoint should raise an exception
  3. Raise the request for end point via the gateway.

The code sample in the following link demonstrates the issue
Demo Code

Similar problem is mentioned in the following Stackoverflow question too

Specifications

  • Ocelot: 17.0.1
  • Ocelot.Provider.Polly: 17.0.1
@ks1990cn
Copy link
Contributor

Hi,

Thanks for reporting this bug @anuviswan.
@anuviswan Do you want to propose any solution or anything else you want to add on?

I am able to replicate and started looking into it. @raman-m

@ks1990cn
Copy link
Contributor

ks1990cn commented Oct 23, 2023

Raised bug to Polly, App-vNext/Polly#1716.

We are in touch there. Thanks

@anuviswan @raman-m

@raman-m
Copy link
Member

raman-m commented Oct 23, 2023

@ks1990cn Tanmay,
Are you sure it's a bug of Polly when posting on Polly boards?
I'm not sure... and the bug can belong to Ocelot, because of ugly Exception-to-Error design, see the HttpExceptionToErrorMapper class.

@ggnaegi Welcome! To have a fun! 😉

@raman-m raman-m self-assigned this Oct 23, 2023
@raman-m raman-m added bug Identified as a potential bug needs validation Issue has not been replicated or verified yet needs feedback Issue is waiting on feedback before acceptance medium effort Likely a few days of development effort small effort Likely less than a day of development effort. labels Oct 23, 2023
@raman-m
Copy link
Member

raman-m commented Oct 23, 2023

@anuviswan Hey, Anu!
Thanks for reporting this bug!

Version: Ocelot - 17.0.1, Ocelot.Provider.Polly 17.0.1

Could you confirm please, that the bug exists in 19.x and 20.0 versions?

@ks1990cn
Copy link
Contributor

ks1990cn commented Oct 23, 2023

@raman-m I've tried my own projects and created HttpClient call. After multiple errors from downstream I was unable to find circuitbreaker is working even in my case.
Its not about exception code, exception code is not changing at all in any way. You can see my GIF attached to Polly board.
Ocelot code looks fine to me.

Anyone else also can verify, I will retro inspect my things again.

@ks1990cn
Copy link
Contributor

@ks1990cn Tanmay, Are you sure it's a bug of Polly when posting on Polly boards? I'm not sure... and the bug can belong to Ocelot, because of ugly Exception-to-Error design, see the HttpExceptionToErrorMapper class.

@ggnaegi Welcome! To have a fun! 😉

We don't get code to mapper part even, here is my debugger point
image

@ggnaegi
Copy link
Member

ggnaegi commented Oct 23, 2023

@raman-m @ks1990cn @anuviswan Guys, I thought I was writing you rubbish. But, we have a bigger problem here:
First, as I wrote before, we shouldn't create a new CircuitBreaker per request / route. It will never work. I was suddenly unsure because in another project we had retry policies aswell.

I was able to make this acceptance test pass:

[Fact]
        public void should_open_circuit_breaker_after_two_exceptions()
        {
            var port = RandomPortFinder.GetRandomPort();

            var configuration = new FileConfiguration
            {
                Routes = new List<FileRoute>
                {
                    new()
                    {
                        DownstreamPathTemplate = "/",
                        DownstreamScheme = "http",
                        DownstreamHostAndPorts = new List<FileHostAndPort>
                        {
                            new()
                            {
                                Host = "localhost",
                                Port = port,
                            },
                        },
                        UpstreamPathTemplate = "/",
                        UpstreamHttpMethod = new List<string> { "Get" },
                        QoSOptions = new FileQoSOptions
                        {
                            ExceptionsAllowedBeforeBreaking = 2,
                            TimeoutValue = 5000,
                            DurationOfBreak = 1000,
                        },
                    },
                },
            };

            this.Given(x => x.GivenThereIsABrokenServiceRunningOn($"http://localhost:{port}"))
                .And(x => _steps.GivenThereIsAConfiguration(configuration))
                .And(x => _steps.GivenOcelotIsRunningWithPolly())
                .And(x => _steps.WhenIGetUrlOnTheApiGateway("/"))
                .And(x => _steps.WhenIGetUrlOnTheApiGateway("/"))
                .And(x => _steps.WhenIGetUrlOnTheApiGateway("/"))
                .Then(x => _steps.ThenTheStatusCodeShouldBe(HttpStatusCode.ServiceUnavailable))
                .BDDfy();
        }

But the code is quite ugly...

Second, in PollyCircuitBreakingDelegatingHandler, no exceptions are thrown here when the client return status 500. I was able to make it work with this terrible bit of code:

var policyResult = await policy.ExecuteAsync(async () =>
            {
                var result = await base.SendAsync(request, cancellationToken);
                if (result.StatusCode == HttpStatusCode.InternalServerError)
                {
                    throw new HttpRequestException("Service Unavailable");
                }

                return result;
            });

            return policyResult;

Then, after two tries, the Circuit Breaker kicks in, otherwise never...

Summary:

  • We shouldn't create a Circuit Breaker per route / request, only maintain one Circuit Breaker per route
  • The Circuit Breaker is Expecting an Exception to kick in, a response with status 500 is not an Exception

@ks1990cn is right when he says that the circuit breaker is not working, but it's not a bug in polly, but a bug in ocelot.

@ks1990cn
Copy link
Contributor

ks1990cn commented Oct 24, 2023

@ggnaegi commented on Oct 23

Correct I agree on this, I made configuration issue too on my personal project. https://www.youtube.com/watch?v=z6YcU0PW_9E&t=1s helped me to configure it properly. I was doing same mistake. Closed bug on Polly.

@raman-m raman-m added QoS Ocelot feature: Quality of Service aka Polly accepted Bug or feature would be accepted as a PR or is being worked on and removed medium effort Likely a few days of development effort small effort Likely less than a day of development effort. needs validation Issue has not been replicated or verified yet needs feedback Issue is waiting on feedback before acceptance labels Oct 25, 2023
@raman-m
Copy link
Member

raman-m commented Oct 25, 2023

@raman-m
Copy link
Member

raman-m commented Oct 30, 2023

@ggnaegi commented on Oct 23

  • But, we have a bigger problem here:
  • the circuit breaker is not working, but it's not a bug in Polly, but a bug in ocelot.

Well... Has this problem already addressed in PR #1753, so is it included in PR?
Sorry, I ask strange questions sometimes

@ggnaegi
Copy link
Member

ggnaegi commented Oct 30, 2023

@raman-m commented on Oct 30

Yes it's part of the PR

@raman-m
Copy link
Member

raman-m commented Nov 3, 2023

@anuviswan Hey Anu,
Could you check & test the bug fix we've prepared as a gift for you, plz? 😉

raman-m added a commit that referenced this issue Nov 4, 2023
…g issue (#1753)

* When using the QoS option "ExceptionsAllowedBeforeBreaking" the circuit breaker never opens the circuit.

* merge issue, PortFinder

* some code improvements, using httpresponsemessage status codes as a base for circuit breaker

* Adding more unit tests, and trying to mitigate the test issues with the method "GivenThereIsAPossiblyBrokenServiceRunningOn"

* fixing some test issues

* setting timeout value to 5000 to avoid side effects

* again timing issues

* timing issues again

* ok, first one ok

* Revert "ok, first one ok"

This reverts commit 2e4a673.

* inline method

* putting back logging for http request exception

* removing logger configuration, back to default

* adding a bit more tests to check the policy wrap

* Removing TimeoutStrategy from parameters, it's set by default to pessimistic, at least one policy will be returned, so using First() in circuit breaker and removing the branch Policy == null from delegating handler.

* Fix StyleCop warnings

* Format parameters

* Sort usings

* since we might have two policies wrapped,  timeout and circuit breaker, we can't use the name CircuitBreaker for polly qos provider, it's not right. Using PollyPolicyWrapper and AsnycPollyPolicy instead.

* modifying circuit breaker delegating handler name, usin Polly policies instead

* renaming CircuitBreakerFactory to PolicyWrapperFactory in tests

* DRY for FileConfiguration, using FileConfigurationFactory

* Add copy constructor

* Refactor setup

* Use expression body for method

* Fix acceptance test

* IDE1006 Naming rule violation: These words must begin with upper case characters

* CA1816 Change ReturnsErrorTests.Dispose() to call GC.SuppressFinalize(object)

* Sort usings

* Use expression body for method

* Return back named arguments

---------

Co-authored-by: raman-m <dotnet044@gmail.com>
@raman-m raman-m removed the accepted Bug or feature would be accepted as a PR or is being worked on label Nov 4, 2023
@raman-m raman-m added the merged Issue has been merged to dev and is waiting for the next release label Nov 4, 2023
ibnuda pushed a commit to ibnuda/Ocelot that referenced this issue Nov 8, 2023
…ionsAllowedBeforeBreaking issue (ThreeMammals#1753)

* When using the QoS option "ExceptionsAllowedBeforeBreaking" the circuit breaker never opens the circuit.

* merge issue, PortFinder

* some code improvements, using httpresponsemessage status codes as a base for circuit breaker

* Adding more unit tests, and trying to mitigate the test issues with the method "GivenThereIsAPossiblyBrokenServiceRunningOn"

* fixing some test issues

* setting timeout value to 5000 to avoid side effects

* again timing issues

* timing issues again

* ok, first one ok

* Revert "ok, first one ok"

This reverts commit 2e4a673.

* inline method

* putting back logging for http request exception

* removing logger configuration, back to default

* adding a bit more tests to check the policy wrap

* Removing TimeoutStrategy from parameters, it's set by default to pessimistic, at least one policy will be returned, so using First() in circuit breaker and removing the branch Policy == null from delegating handler.

* Fix StyleCop warnings

* Format parameters

* Sort usings

* since we might have two policies wrapped,  timeout and circuit breaker, we can't use the name CircuitBreaker for polly qos provider, it's not right. Using PollyPolicyWrapper and AsnycPollyPolicy instead.

* modifying circuit breaker delegating handler name, usin Polly policies instead

* renaming CircuitBreakerFactory to PolicyWrapperFactory in tests

* DRY for FileConfiguration, using FileConfigurationFactory

* Add copy constructor

* Refactor setup

* Use expression body for method

* Fix acceptance test

* IDE1006 Naming rule violation: These words must begin with upper case characters

* CA1816 Change ReturnsErrorTests.Dispose() to call GC.SuppressFinalize(object)

* Sort usings

* Use expression body for method

* Return back named arguments

---------

Co-authored-by: raman-m <dotnet044@gmail.com>
@raman-m raman-m added the Oct'23 label Nov 10, 2023
@raman-m raman-m added this to the October'23 milestone Nov 18, 2023
raman-m added a commit that referenced this issue Nov 28, 2023
* #1712 Bump to Polly 8.0 (#1714)

* #1712 Migrate to Polly 8.0

* code review post merge

* post PR

* #1712 Migrate to Polly 8.0

* code review post merge

* Update src/Ocelot.Provider.Polly/PollyQoSProvider.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* namespaces

* Refactor QoS provider

* Refactor AddPolly extension

* Remove single quote because semicolon ends sentence

---------

Co-authored-by: Ray <rmessie@traceparts.com>
Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Cache by header value: a new Header property in (File)CacheOptions configuration of a route (#1172)

@EngRajabi, Mohsen Rajabi (7):
      add header to file cache option
      fix private set
      fix
      <none>
      <none>
      fix build fail
      fix: fix review comment. add unit test for change

@raman-m, Raman Maksimchuk (1):
      Update caching.rst

@raman-m (7):
      Fix errors
      Fix errors
      Fix styling warnings
      Refactor tests
      Add Delimiter
      Refactor generator
      Add unit tests

* Find available port in integration tests (#1173)

* use random ports in integration tests

* Remove and Sort Usings

* Code modern look

* code review

* code review: fix messages

* code review: fix some messages

* code review: Use simple `using` statement

* Add Ocelot.Testing project

---------

Co-authored-by: raman-m <dotnet044@gmail.com>

* #952 #1174 Merge query strings without duplicate values (#1182)

* Fix issue  #952 and #1174

* Fix compiling errors

* Fix warnings

* Fix errors

* Remove and Sort Usings

* CA1845 Use span-based 'string.Concat' and 'AsSpan' instead of 'Substring'.
Use 'AsSpan' with 'string.Concat'

* IDE1006 Naming rule violation: These words must begin with upper case characters: {should_*}.
Fix name violation

* Add namespace

* Fix build errors

* Test class should match the name of tested class

* Simplify too long class names, and they should match

* Move to the parent folder which was empty

* Fix warnings

* Process dictionaries using LINQ to Objects approach

* Fix code review issues from @RaynaldM

* Remove tiny private helper with one reference

* Fix warning & messages

* Define theory instead of 2 facts

* Add unit test for issue #952

* Add additional unit test for #952 to keep param

* Add tests for issue #1174

* Remove unnecessary parameter

* Copy routing.rst from released version

* Refactor the middleware body for query params

* Update routing.rst: Describe query string user scenarios

---------

Co-authored-by: Stjepan Majdak <stjepan.majdak@a1.hr>
Co-authored-by: raman-m <dotnet044@gmail.com>

* #1550 #1706 Addressing the QoS options ExceptionsAllowedBeforeBreaking issue (#1753)

* When using the QoS option "ExceptionsAllowedBeforeBreaking" the circuit breaker never opens the circuit.

* merge issue, PortFinder

* some code improvements, using httpresponsemessage status codes as a base for circuit breaker

* Adding more unit tests, and trying to mitigate the test issues with the method "GivenThereIsAPossiblyBrokenServiceRunningOn"

* fixing some test issues

* setting timeout value to 5000 to avoid side effects

* again timing issues

* timing issues again

* ok, first one ok

* Revert "ok, first one ok"

This reverts commit 2e4a673.

* inline method

* putting back logging for http request exception

* removing logger configuration, back to default

* adding a bit more tests to check the policy wrap

* Removing TimeoutStrategy from parameters, it's set by default to pessimistic, at least one policy will be returned, so using First() in circuit breaker and removing the branch Policy == null from delegating handler.

* Fix StyleCop warnings

* Format parameters

* Sort usings

* since we might have two policies wrapped,  timeout and circuit breaker, we can't use the name CircuitBreaker for polly qos provider, it's not right. Using PollyPolicyWrapper and AsnycPollyPolicy instead.

* modifying circuit breaker delegating handler name, usin Polly policies instead

* renaming CircuitBreakerFactory to PolicyWrapperFactory in tests

* DRY for FileConfiguration, using FileConfigurationFactory

* Add copy constructor

* Refactor setup

* Use expression body for method

* Fix acceptance test

* IDE1006 Naming rule violation: These words must begin with upper case characters

* CA1816 Change ReturnsErrorTests.Dispose() to call GC.SuppressFinalize(object)

* Sort usings

* Use expression body for method

* Return back named arguments

---------

Co-authored-by: raman-m <dotnet044@gmail.com>

* #1179 Add missing documentation for Secured WebSocket #1180

* Add "WebSocket Secure" and "SSL Errors" sections (#1180)

Co-authored-by: raman-m <dotnet044@gmail.com>

* Resolve issues with projects after auto-merging. Format Document

* #1744 Avoid calls to 'Logger.Log' if LogLevel not enabled in appsettings.json (#1745)

* changing string parameter for IOcelotLogger function to Func<string>, modifying asp dot net logger, only one main method and verifying if LogLevel is enabled. If log level isn't enabled, then return.

    pick 847dac7 changing string parameter for IOcelotLogger function to Func<string>, modifying asp dot net logger, only one main method and verifying if LogLevel is enabled. If log level isn't enabled, then return.
    pick d7a8397 adding back the logger methods with string as parameter, avoiding calling the factory when plain string are used.
    pick d413201 simplify method calls

* adding back the logger methods with string as parameter, avoiding calling the factory when plain string are used.

* simplify method calls

* adding unit test case, If minimum log level not set then no logs are written

* adding logging benchmark

* code cleanup in steps and naming issues fixes

   pick c4f6dc9 adding loglevel acceptance tests, verifying that the logs are returned according to the minimum log level set in appsettings
   pick 478f139 enhanced unit tests, verifying 1) that the log method is only called when log level enabled 2) that the string function is only invoked when log level enabled

* adding loglevel acceptance tests, verifying that the logs are returned according to the minimum log level set in appsettings

* enhanced unit tests, verifying 1) that the log method is only called when log level enabled 2) that the string function is only invoked when log level enabled

* weird issue with the merge.

* adding comment

* Update src/Ocelot/ServiceDiscovery/ServiceDiscoveryProviderFactory.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update src/Ocelot/Claims/Middleware/ClaimsToClaimsMiddleware.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update src/Ocelot/Configuration/Repository/FileConfigurationPoller.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update src/Ocelot/DownstreamRouteFinder/Finder/DownstreamRouteProviderFactory.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update src/Ocelot/Logging/AspDotNetLogger.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update test/Ocelot.AcceptanceTests/LogLevelTests.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* Update src/Ocelot/Configuration/Repository/FileConfigurationPoller.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* As mentioned, using OcelotLogger instead of AspDotNeLogger as default logger name

* Some code refactoring and usage of factories in LogLevelTests

* Update src/Ocelot/Claims/Middleware/ClaimsToClaimsMiddleware.cs

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* using overrided method WriteLog for strings, some changes as requested,

* code changes after review 2

    pick ad0e060 Update test/Ocelot.UnitTests/Middleware/OcelotPiplineBuilderTests.cs

* checking test cases

* adding ms logger benchmarks with console provider. Unfortunately, benchmark.net doesn't support "quiet" mode yet.

* 2 small adjustments

* Adding multi targets support for serilog

* Fix warnings

* Review new logger

* Fix unit tests

* The last change but not least

* Update logging.rst: Add draft

* Update logging.rst: Add RequestId section

* Update logging.rst: "Best Practices" section

* Update logging.rst: "Top Logging Performance?" subsection

* Update logging.rst: Rewrite "Request ID" section

* Update requestid.rst: Review and up to date

* Update logging.rst: "Run Benchmarks" section

---------

Co-authored-by: Raman Maksimchuk <dotnet044@gmail.com>

* #1783 Less logs for circuit breakers (Polly exceptions) (#1786)

* #1783 More accurate logs for circuit breakers (and other "polly" exceptions)
Remove try/catch in PollyPoliciesDelegatingHandler and add a more generic AddPolly<T> to be able to use a specific PollyQoSProvider

* fix should_be_invalid_re_route_using_downstream_http_version UT

* fix remarks on PR

* arrange code

* fix UT

* merge with release/net8 branch

* switch benchmark to Net8

* Fix warnings

* Final review

---------

Co-authored-by: Ray <rmessie@traceparts.com>
Co-authored-by: raman-m <dotnet044@gmail.com>

* Revert #1172 feature (#1807)

* Revert #1172

* Remove Header

* Take actual version of caching.rst and remove Header info

* Release 22.0 | +semver: breaking

---------

Co-authored-by: Raynald Messié <redbird_project@yahoo.fr>
Co-authored-by: Ray <rmessie@traceparts.com>
Co-authored-by: Mohsen Rajabi <m.kabir8895@gmail.com>
Co-authored-by: jlukawska <56401969+jlukawska@users.noreply.github.com>
Co-authored-by: Stjepan <majdak.stjepan@gmail.com>
Co-authored-by: Stjepan Majdak <stjepan.majdak@a1.hr>
Co-authored-by: Guillaume Gnaegi <58469901+ggnaegi@users.noreply.github.com>
Co-authored-by: Samuel Poirier <sam9291p@gmail.com>
raman-m added a commit to ibnuda/Ocelot that referenced this issue Feb 20, 2024
…ionsAllowedBeforeBreaking issue (ThreeMammals#1753)

* When using the QoS option "ExceptionsAllowedBeforeBreaking" the circuit breaker never opens the circuit.

* merge issue, PortFinder

* some code improvements, using httpresponsemessage status codes as a base for circuit breaker

* Adding more unit tests, and trying to mitigate the test issues with the method "GivenThereIsAPossiblyBrokenServiceRunningOn"

* fixing some test issues

* setting timeout value to 5000 to avoid side effects

* again timing issues

* timing issues again

* ok, first one ok

* Revert "ok, first one ok"

This reverts commit 2e4a673.

* inline method

* putting back logging for http request exception

* removing logger configuration, back to default

* adding a bit more tests to check the policy wrap

* Removing TimeoutStrategy from parameters, it's set by default to pessimistic, at least one policy will be returned, so using First() in circuit breaker and removing the branch Policy == null from delegating handler.

* Fix StyleCop warnings

* Format parameters

* Sort usings

* since we might have two policies wrapped,  timeout and circuit breaker, we can't use the name CircuitBreaker for polly qos provider, it's not right. Using PollyPolicyWrapper and AsnycPollyPolicy instead.

* modifying circuit breaker delegating handler name, usin Polly policies instead

* renaming CircuitBreakerFactory to PolicyWrapperFactory in tests

* DRY for FileConfiguration, using FileConfigurationFactory

* Add copy constructor

* Refactor setup

* Use expression body for method

* Fix acceptance test

* IDE1006 Naming rule violation: These words must begin with upper case characters

* CA1816 Change ReturnsErrorTests.Dispose() to call GC.SuppressFinalize(object)

* Sort usings

* Use expression body for method

* Return back named arguments

---------

Co-authored-by: raman-m <dotnet044@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Identified as a potential bug merged Issue has been merged to dev and is waiting for the next release QoS Ocelot feature: Quality of Service aka Polly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants