Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetryingChannel retries 429/503s #350

Merged
merged 9 commits into from
Feb 17, 2020
Merged

RetryingChannel retries 429/503s #350

merged 9 commits into from
Feb 17, 2020

Conversation

iamdanfox
Copy link
Contributor

@iamdanfox iamdanfox commented Feb 17, 2020

Would love to merge simulation charts before this: #348, as we'll be able to see the difference in the graphs!

GRAPHS

Before this PR

A single blip of brokenness would be passed on to clients, resulting in user-facing impact.

After this PR

==COMMIT_MSG==
RetryingChannel retries 429/503s
==COMMIT_MSG==

Possible downsides?

this might have weird effects if a request is not idempotent (e.g. "sendEmail") and the server breaks halfway though servicing the request (i.e. multiple emails could get sent).

In a future PR, I think we should consider retrying 500s too.

@changelog-app
Copy link

changelog-app bot commented Feb 17, 2020

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

RetryingChannel retries 500/503s

Check the box to generate changelog(s)

  • Generate changelog entry

retryOrFail(() -> throwable);
}

private void retryOrFail(Supplier<Throwable> throwable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these failures should always be logged at info or warn level, so we can do away with the supplier

@markelliot
Copy link
Contributor

for composition reasons, these failure types probably ought to be encoded by a different channel

return delegate.execute(endpoint, request);
};
FutureCallback<Response> retryer = new RetryingCallback(callSupplier, future);
Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Futures.addCallback(callSupplier.apply(0), retryer, DIRECT_EXECUTOR);
return DialogueFutures.addDirectCallback(callSupplier.apply(0), retryer);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's OK with you I think I'd actually prefer to just stick with the vanilla guava - I find it kinda reassuring that there's no magic going on under the hood

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems like extra boilerplate 🤷‍♂ feels off to have a utility exactly for this and then not use it

public void onSuccess(Response result) {
// this condition should really match the BlacklistingChannel so that we don't hit the same host twice in
// a row
if (result.code() == 503 || result.code() == 500) {
Copy link
Contributor

@ferozco ferozco Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not retry 429's as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to land that in lock-step PR with a change to the BlacklistingChannel, as otherwise you might retry on the same host immediately!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why, per my comment, the location of this check is likely wrong

@iamdanfox
Copy link
Contributor Author

iamdanfox commented Feb 17, 2020

public void onSuccess(Response response) {
// this condition should really match the BlacklistingChannel so that we don't hit the same host twice in
// a row
if (response.code() == 503 || response.code() == 500) {
Copy link
Contributor Author

@iamdanfox iamdanfox Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more carefully here, I think we should possibly just match c-j-r's old behaviour to minimize disruption on the rollout:

  • 1xx and 2xx are considered successful
  • QoShandler catches 308, 429, 503 which are eligible for retry.
  • everything else goes straight into the error pipe.

We can debate the 'retry 500s' thing separately.

@iamdanfox iamdanfox changed the title RetryingChannel retries 500/503s RetryingChannel retries 429/503s Feb 17, 2020
@ferozco
Copy link
Contributor

ferozco commented Feb 17, 2020

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants