RetryingChannel retries 429/503s #350

iamdanfox · 2020-02-17T11:35:50Z

~~Would love to merge simulation charts before this: #348, as we'll be able to see the difference in the graphs!~~

GRAPHS

Before this PR

A single blip of brokenness would be passed on to clients, resulting in user-facing impact.

After this PR

==COMMIT_MSG==
RetryingChannel retries 429/503s
==COMMIT_MSG==

Possible downsides?

~~this might have weird effects if a request is not idempotent (e.g. "sendEmail") and the server breaks halfway though servicing the request (i.e. multiple emails could get sent).~~

In a future PR, I think we should consider retrying 500s too.

changelog-app · 2020-02-17T11:35:55Z

Generate changelog in `changelog/@unreleased`

Type

Description

RetryingChannel retries 500/503s

Check the box to generate changelog(s)

Generate changelog entry

dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java

carterkozak · 2020-02-17T11:50:47Z

dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java

+            retryOrFail(() -> throwable);
+        }
+
+        private void retryOrFail(Supplier<Throwable> throwable) {


I think these failures should always be logged at info or warn level, so we can do away with the supplier

markelliot · 2020-02-17T12:30:42Z

for composition reasons, these failure types probably ought to be encoded by a different channel

ferozco · 2020-02-17T12:35:49Z

dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java

+            return delegate.execute(endpoint, request);
+        };
+        FutureCallback<Response> retryer = new RetryingCallback(callSupplier, future);
+        Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR);


Suggested change

Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR);

return DialogueFutures.addDirectCallback(callSupplier.apply(0), retryer);

If it's OK with you I think I'd actually prefer to just stick with the vanilla guava - I find it kinda reassuring that there's no magic going on under the hood

It just seems like extra boilerplate 🤷‍♂ feels off to have a utility exactly for this and then not use it

ferozco · 2020-02-17T12:43:39Z

dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java

+        public void onSuccess(Response result) {
+            // this condition should really match the BlacklistingChannel so that we don't hit the same host twice in
+            // a row
+            if (result.code() == 503 || result.code() == 500) {


why not retry 429's as well?

I think I'd like to land that in lock-step PR with a change to the BlacklistingChannel, as otherwise you might retry on the same host immediately!

this is why, per my comment, the location of this check is likely wrong

simulation/src/test/java/com/palantir/dialogue/core/SimulationTest.java

iamdanfox · 2020-02-17T21:15:23Z

Omg I love graphs https://github.com/palantir/dialogue/blob/dfox/retrying/simulation/src/test/resources/report.md#slow_503s_then_revertconcurrency_limiterpng

iamdanfox · 2020-02-17T21:32:03Z

dialogue-core/src/main/java/com/palantir/dialogue/core/RetryingChannel.java

+        public void onSuccess(Response response) {
+            // this condition should really match the BlacklistingChannel so that we don't hit the same host twice in
+            // a row
+            if (response.code() == 503 || response.code() == 500) {


Thinking a bit more carefully here, I think we should possibly just match c-j-r's old behaviour to minimize disruption on the rollout:

1xx and 2xx are considered successful

QoShandler catches 308, 429, 503 which are eligible for retry.

everything else goes straight into the error pipe.

We can debate the 'retry 500s' thing separately.

ferozco · 2020-02-17T21:55:11Z

👍

iamdanfox mentioned this pull request Feb 17, 2020

[test-only] Simulate dialogue clients against a DeterministicScheduler #348

Merged

carterkozak reviewed Feb 17, 2020

View reviewed changes

ferozco reviewed Feb 17, 2020

View reviewed changes

iamdanfox commented Feb 17, 2020

View reviewed changes

simulation/src/test/java/com/palantir/dialogue/core/SimulationTest.java Outdated Show resolved Hide resolved

iamdanfox added 6 commits February 17, 2020 20:56

RetryingChannel retries 500/503s

f860522

Add generated changelog entries

28b240d

fix test

73c0f60

Close response body

c6926fe

Test capturing response body closing

d40997b

new graphs

5e73ced

iamdanfox force-pushed the dfox/retrying branch from 4e2b200 to 5e73ced Compare February 17, 2020 20:58

iamdanfox added 2 commits February 17, 2020 21:06

LFS url

18f29fc

H2 gives us anchor links

28fdcd2

iamdanfox force-pushed the dfox/retrying branch from 6ad8987 to 28fdcd2 Compare February 17, 2020 21:15

iamdanfox commented Feb 17, 2020

View reviewed changes

iamdanfox changed the title ~~RetryingChannel retries 500/503s~~ RetryingChannel retries 429/503s Feb 17, 2020

iamdanfox requested review from carterkozak and ferozco February 17, 2020 21:44

only retry 429 and 503 for now

109f1fe

iamdanfox force-pushed the dfox/retrying branch from a51e621 to 109f1fe Compare February 17, 2020 21:47

iamdanfox added the merge when ready label Feb 17, 2020

bulldozer-bot bot merged commit 0764095 into develop Feb 17, 2020

bulldozer-bot bot deleted the dfox/retrying branch February 17, 2020 21:55

iamdanfox mentioned this pull request Feb 18, 2020

RetryingChannel leaks response bodies when the returned future is cancelled #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RetryingChannel retries 429/503s #350

RetryingChannel retries 429/503s #350

iamdanfox commented Feb 17, 2020 •

edited

Loading

changelog-app bot commented Feb 17, 2020 •

edited by iamdanfox

Loading

carterkozak Feb 17, 2020

markelliot commented Feb 17, 2020

ferozco Feb 17, 2020

iamdanfox Feb 17, 2020

ferozco Feb 17, 2020

ferozco Feb 17, 2020 •

edited

Loading

iamdanfox Feb 17, 2020

markelliot Feb 17, 2020

iamdanfox commented Feb 17, 2020 •

edited

Loading

iamdanfox Feb 17, 2020 •

edited

Loading

ferozco commented Feb 17, 2020

	Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR);
	return DialogueFutures.addDirectCallback(callSupplier.apply(0), retryer);

RetryingChannel retries 429/503s #350

RetryingChannel retries 429/503s #350

Conversation

iamdanfox commented Feb 17, 2020 • edited Loading

GRAPHS

Before this PR

After this PR

Possible downsides?

changelog-app bot commented Feb 17, 2020 • edited by iamdanfox Loading

Generate changelog in changelog/@unreleased

carterkozak Feb 17, 2020

Choose a reason for hiding this comment

markelliot commented Feb 17, 2020

ferozco Feb 17, 2020

Choose a reason for hiding this comment

iamdanfox Feb 17, 2020

Choose a reason for hiding this comment

ferozco Feb 17, 2020

Choose a reason for hiding this comment

ferozco Feb 17, 2020 • edited Loading

Choose a reason for hiding this comment

iamdanfox Feb 17, 2020

Choose a reason for hiding this comment

markelliot Feb 17, 2020

Choose a reason for hiding this comment

iamdanfox commented Feb 17, 2020 • edited Loading

iamdanfox Feb 17, 2020 • edited Loading

Choose a reason for hiding this comment

ferozco commented Feb 17, 2020

iamdanfox commented Feb 17, 2020 •

edited

Loading

changelog-app bot commented Feb 17, 2020 •

edited by iamdanfox

Loading

Generate changelog in `changelog/@unreleased`

ferozco Feb 17, 2020 •

edited

Loading

iamdanfox commented Feb 17, 2020 •

edited

Loading

iamdanfox Feb 17, 2020 •

edited

Loading