Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP 500 errors should be retried #504

Closed
symphony-elias opened this issue Apr 21, 2021 · 3 comments · Fixed by #509
Closed

HTTP 500 errors should be retried #504

symphony-elias opened this issue Apr 21, 2021 · 3 comments · Fixed by #509
Labels
good first issue Good for newcomers [type] bug Something isn't working
Milestone

Comments

@symphony-elias
Copy link
Contributor

Bug Report

During maintenance windows, when pod, km or agent is down, HTTP calls can result in HTTP 500. Such calls need to be retried.
For instance, on one of our bots, we got:

2021-04-20 18:26:26.546  INFO 1 --- [_DatafeedThread] c.s.b.c.s.datafeed.impl.DatafeedLoopV1   : Recreate a new datafeed and try again
2021-04-20 18:26:26.684  INFO 1 --- [_DatafeedThread] .s.b.c.r.r.Resilience4jRetryWithRecovery : Retry in 64.0s...
2021-04-20 22:01:45.235  INFO 1 --- [_DatafeedThread] c.s.b.c.s.datafeed.impl.DatafeedLoopV1   : Recreate a new datafeed and try again
2021-04-20 22:01:45.389 ERROR 1 --- [_DatafeedThread] c.s.b.s.s.DatafeedAsyncLauncherService   : An API error has been received while starting the Datafeed loop in a separate thread, please check error below:

com.symphony.bdk.http.api.ApiException: {"code":500,"message":"Received an error when calling a pod endpoint"}
        at com.symphony.bdk.http.jersey2.ApiClientJersey2.invokeAPI(ApiClientJersey2.java:168) ~[symphony-bdk-http-jersey2-2.1.3.jar:2.1.3]
        at com.symphony.bdk.gen.api.DatafeedApi.v4DatafeedCreatePostWithHttpInfo(DatafeedApi.java:479) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.gen.api.DatafeedApi.v4DatafeedCreatePost(DatafeedApi.java:414) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.service.datafeed.impl.DatafeedLoopV1.createDatafeedAndPersist(DatafeedLoopV1.java:144) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.retry.RetryWithRecovery.executeOnce(RetryWithRecovery.java:105) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at io.github.resilience4j.retry.Retry.lambda$decorateCheckedSupplier$3f69f149$1(Retry.java:137) ~[resilience4j-retry-1.6.1.jar:1.6.1]
        at io.github.resilience4j.retry.Retry.executeCheckedSupplier(Retry.java:419) ~[resilience4j-retry-1.6.1.jar:1.6.1]
        at com.symphony.bdk.core.retry.resilience4j.Resilience4jRetryWithRecovery.execute(Resilience4jRetryWithRecovery.java:65) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.service.datafeed.impl.DatafeedLoopV1.createDatafeed(DatafeedLoopV1.java:139) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.service.datafeed.impl.DatafeedLoopV1.recreateDatafeed(DatafeedLoopV1.java:127) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.retry.RecoveryStrategy.runRecovery(RecoveryStrategy.java:48) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.retry.RetryWithRecovery.handleRecovery(RetryWithRecovery.java:159) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.retry.RetryWithRecovery.executeOnce(RetryWithRecovery.java:112) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at io.github.resilience4j.retry.Retry.lambda$decorateCheckedSupplier$3f69f149$1(Retry.java:137) ~[resilience4j-retry-1.6.1.jar:1.6.1]
        at io.github.resilience4j.retry.Retry.executeCheckedSupplier(Retry.java:419) ~[resilience4j-retry-1.6.1.jar:1.6.1]
        at com.symphony.bdk.core.retry.resilience4j.Resilience4jRetryWithRecovery.execute(Resilience4jRetryWithRecovery.java:65) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.service.datafeed.impl.DatafeedLoopV1.readDatafeed(DatafeedLoopV1.java:121) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.core.service.datafeed.impl.DatafeedLoopV1.start(DatafeedLoopV1.java:86) ~[symphony-bdk-core-2.1.3.jar:2.1.3]
        at com.symphony.bdk.spring.service.DatafeedAsyncLauncherService.uncheckedStart(DatafeedAsyncLauncherService.java:88) ~[symphony-bdk-core-spring-boot-starter-2.1.3.jar:2.1.3]
        at com.symphony.bdk.http.api.tracing.MDCUtils$MdcRunnable.run(MDCUtils.java:59) ~[symphony-bdk-http-api-2.1.3.jar:2.1.3]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]

Expected Result:

HTTP 500 errors should be retried, see: https://github.com/finos/symphony-bdk-java/blob/main/symphony-bdk-http/symphony-bdk-http-api/src/main/java/com/symphony/bdk/http/api/ApiException.java#L81

Actual Result:

HTTP 500 errors lead to bot failing.

@symphony-elias symphony-elias added [type] bug Something isn't working good first issue Good for newcomers labels Apr 21, 2021
@SivaTharun
Copy link

@symphony-elias i am a newbie to open source contribution, can i start working on this issue.

@thibauult
Copy link
Member

No problem @SivaTharun, your contribution will be very welcome!

Do you need any help before starting?

@SivaTharun
Copy link

Thanks @symphony-thibault for following up, i just need to know how to reproduce the above scenario, if you can point me to a unit test, which asserts for the 500 error from DatafeedAsyncLauncherService class (or) you can point to me to the resource for reproducing the above exception.

@thibauult thibauult added this to the 2.1.6 milestone Apr 29, 2021
thibauult added a commit to thibauult/symphony-bdk-java that referenced this issue Apr 29, 2021
thibauult added a commit to thibauult/symphony-bdk-java that referenced this issue Apr 29, 2021
thibauult added a commit to thibauult/symphony-bdk-java that referenced this issue Apr 30, 2021
thibauult added a commit to thibauult/symphony-bdk-java that referenced this issue Apr 30, 2021
thibauult added a commit to thibauult/symphony-bdk-java that referenced this issue Apr 30, 2021
thibauult added a commit that referenced this issue Apr 30, 2021
We recently noticed that the DatafeedLoop was crashing with `500` errors returned from Agent. This is something we initially stated that we did not wanted to retry on this specific status. Now we do, as it has actually been raised by some customers.
thibauult added a commit that referenced this issue Apr 30, 2021
We recently noticed that the DatafeedLoop was crashing with `500` errors returned from Agent. This is something we initially stated that we did not wanted to retry on this specific status. Now we do, as it has actually been raised by some customers.
thibauult added a commit that referenced this issue Apr 30, 2021
thibauult added a commit that referenced this issue Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers [type] bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants