-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RetryingChannel retries 429/503s #350
Conversation
Generate changelog in
|
dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java
Outdated
Show resolved
Hide resolved
dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java
Show resolved
Hide resolved
dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java
Outdated
Show resolved
Hide resolved
retryOrFail(() -> throwable); | ||
} | ||
|
||
private void retryOrFail(Supplier<Throwable> throwable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these failures should always be logged at info or warn level, so we can do away with the supplier
for composition reasons, these failure types probably ought to be encoded by a different channel |
return delegate.execute(endpoint, request); | ||
}; | ||
FutureCallback<Response> retryer = new RetryingCallback(callSupplier, future); | ||
Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR); | |
return DialogueFutures.addDirectCallback(callSupplier.apply(0), retryer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's OK with you I think I'd actually prefer to just stick with the vanilla guava - I find it kinda reassuring that there's no magic going on under the hood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems like extra boilerplate 🤷♂ feels off to have a utility exactly for this and then not use it
public void onSuccess(Response result) { | ||
// this condition should really match the BlacklistingChannel so that we don't hit the same host twice in | ||
// a row | ||
if (result.code() == 503 || result.code() == 500) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not retry 429's as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to land that in lock-step PR with a change to the BlacklistingChannel, as otherwise you might retry on the same host immediately!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is why, per my comment, the location of this check is likely wrong
simulation/src/test/java/com/palantir/dialogue/core/SimulationTest.java
Outdated
Show resolved
Hide resolved
4e2b200
to
5e73ced
Compare
6ad8987
to
28fdcd2
Compare
public void onSuccess(Response response) { | ||
// this condition should really match the BlacklistingChannel so that we don't hit the same host twice in | ||
// a row | ||
if (response.code() == 503 || response.code() == 500) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more carefully here, I think we should possibly just match c-j-r's old behaviour to minimize disruption on the rollout:
- 1xx and 2xx are considered successful
- QoShandler catches 308, 429, 503 which are eligible for retry.
- everything else goes straight into the error pipe.
We can debate the 'retry 500s' thing separately.
a51e621
to
109f1fe
Compare
👍 |
Would love to merge simulation charts before this: #348, as we'll be able to see the difference in the graphs!GRAPHS
Before this PR
A single blip of brokenness would be passed on to clients, resulting in user-facing impact.
After this PR
==COMMIT_MSG==
RetryingChannel retries 429/503s
==COMMIT_MSG==
Possible downsides?
this might have weird effects if a request is not idempotent (e.g. "sendEmail") and the server breaks halfway though servicing the request (i.e. multiple emails could get sent).In a future PR, I think we should consider retrying 500s too.