Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest controller blocked if retry policy enabled #590

Closed
vladimir-narkevich-idf opened this issue Apr 21, 2023 · 7 comments
Closed

Rest controller blocked if retry policy enabled #590

vladimir-narkevich-idf opened this issue Apr 21, 2023 · 7 comments

Comments

@vladimir-narkevich-idf
Copy link

hi, i am having a interesting issue, i have written a simple rest controller that makes a call to an external service with feign client. When i enable retry policy i get a successfully response from feign client but my rest controller gets blocked for about 2 minutes if I disable retry policy I receive response without block.
My code look like this

interface TestFeign {
  @PostMapping("/test")
  fun test(@RequestBody Test request): Mono<String>
}

@Configuration
@EnableReactiveFeignClients(
  clients = [
    TestFeign::class
  ]
)
class ConnectorConfig {

  @Bean
  fun reactiveOptions(httpSettings: HttpSettings): ReactiveOptions =
    WebReactiveOptions.Builder()
      .setReadTimeoutMillis(Duration.parse(httpSettings.readTimeout).toMillis())
      .setWriteTimeoutMillis(Duration.parse(httpSettings.writeTimeout).toMillis())
      .setConnectTimeoutMillis(Duration.parse(httpSettings.connectTimeout).toMillis())
      .build()

  @Bean
  fun reactiveRetryPolicy(retryPolicySettings: RetryPolicySettings): ReactiveRetryPolicy =
    FilteredReactiveRetryPolicy.notRetryOn(
      BasicReactiveRetryPolicy.retryWithBackoff(
        retryPolicySettings.maxAttempts,
        Duration.parse(retryPolicySettings.duration).toMillis()
      ),
      JsonMappingException::class.java,
      JsonParseException::class.java,
      FeignException.UnsupportedMediaType::class.java,
      FeignException.MethodNotAllowed::class.java
    )
}

@GetMapping("/system/healty")
fun healty() = TestFeign.test(request)```
@vladimir-narkevich-idf
Copy link
Author

vladimir-narkevich-idf commented Apr 24, 2023

I investigated this case in a little more detail the problem started in spring 2.7.6

@andrericos
Copy link

Same issue here.

@mukeshj13
Copy link

mukeshj13 commented Jun 1, 2023

same issue with me on spring 2.7.10

@mukeshj13
Copy link

mukeshj13 commented Jun 19, 2023

After some investigation, I found the issue. The issue lies in the FluxZip handling of the SimpleReactiveRetryPolicy and RetryPublisherHttpClient which zip the retry signal publishers with an unbounded flux range publisher. Now this wouldn't cause any issues with the older versions of reactor-core but reactor released an update in v3.4.25 where they added the discard support for FluxZip and MonoZip.

So now the time at which we see the application as blocked, the cpu is discarding all the published integers (1 to MAX) from the ZipInner subscriber queue.

Long story short, if you are using SimpleReactiveRetryPolicy, you will face this issue with newer versions of reactor.

As a workaround until this is resolved, you can override the SimpleReactiveRetryPolicy retry method to limit the range to a number you think your retry config will never exceed.

  @Override
  public Retry retry() {
    return Retry.from(errors -> errors
        .zipWith(Flux.range(1, **MAX_ALLOWED_RETRIES** + 1), (signal, index) -> {
          long delay = retryDelay(signal.failure(), index);
          if (delay >= 0) {
            return Tuples.of(delay, signal);
          } else {
            throw Exceptions.propagate(signal.failure());
          }
        }).concatMap(
            tuple2 -> tuple2.getT1() > 0
                ? Mono.delay(Duration.ofMillis(tuple2.getT1()), scheduler)
                    .map(time -> tuple2.getT2().failure())
                : Mono.just(tuple2.getT2().failure())));
  }

In order to make this work, you'll also need to override RetryPublisherHttpClient.wrapWithOutOfRetriesLog
and do the same thing there. This can be tricky since it is a private snippet in reactivefeign. I had to import the entire package to modify it.

    private Retry wrapWithOutOfRetriesLog(ReactiveHttpRequest request) {
        return new Retry() {
            @Override
            public Publisher<?> generateCompanion(Flux<RetrySignal> retrySignals) {
                return Flux.<Object>from(retry.generateCompanion(retrySignals))
                        .onErrorResume(throwable -> Mono.just(new OutOfRetriesWrapper(throwable, request)))
                        .zipWith(Flux.range(1, **MAX_ALLOWED_RETRIES** + 1), (object, index) -> {
                            if(object instanceof OutOfRetriesWrapper){
                                OutOfRetriesWrapper wrapper = (OutOfRetriesWrapper) object;
                                if(index == 1){
                                    throw Exceptions.propagate(wrapper.getCause());
                                } else {
                                    logger.debug("[{}]---> USED ALL RETRIES", feignMethodKey, wrapper.getCause());
                                    throw Exceptions.propagate(
                                            exceptionPropagationPolicy == ExceptionPropagationPolicy.UNWRAP
                                                    ? wrapper.getCause()
                                                    : new OutOfRetriesException(wrapper.getCause(), request));
                                }
                            } else {
                                return object;
                            }
                        });
            }
        };
    }

This requires a change in playtika reactive feign to either allow a configurable retry range or to add support for delegating the discard handling to the queue holder. Hope this helps..

OlPochuiev added a commit to bizzabo/feign-reactive that referenced this issue Aug 16, 2023
@mukeshj13
Copy link

mukeshj13 commented Sep 14, 2023

this puts a cap on max retries, I'd rather prefer a fix like this one
#607

@dimaradchenko
Copy link

I noticed that the Flux.range(1, **MAX_RETRIES**) should actually be Flux.range(1, **MAX_RETRIES** +1). This is because the first time it's running is when the initial call fails and then you start the retries.

@mukeshj13
Copy link

@dimaradchenko thanks for pointing out, updated the pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants