Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: allow retry of streaming/incomplete requests #10725

Merged
merged 14 commits into from
Apr 20, 2020

Conversation

ggreenway
Copy link
Contributor

@ggreenway ggreenway commented Apr 9, 2020

Signed-off-by: Greg Greenway ggreenway@apple.com

Risk Level: Medium
Testing: New integration tests
Docs Changes:
Release Notes: added
Fixes: #10211

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

@mattklein123 We discussed this change briefly a few months ago, and I finally got around to it. Can you take a quick skim and see if I missed anything obvious, or any other high-level concerns about this?

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I would have thought this would be way more complicated. Would definitely like you to test the heck out of this but looks good so far!

@@ -681,13 +682,31 @@ Http::FilterDataStatus Filter::decodeData(Buffer::Instance& data, bool end_strea
retry_state_.reset();
buffering = false;
active_shadow_policies_.clear();

// If we had to abandon buffering and there's no request in progress, abort the request and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit of commenting of how we might get here? is this just during the retry-timer interval?
Also cleanup -> clean up?

ASSERT(upstream_requests_.size() == 1);
upstream_requests_.front()->encodeMetadata(std::move(metadata_map_ptr));
if (!upstream_requests_.empty()) {
// TODO: buffer this if there's no request? It doesn't look like the upstream request has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

router.h: TODO(soya3129): Save metadata for retry, redirect and shadowing case.
Fine to double it here since it added confusion for you

@@ -869,7 +892,7 @@ void Filter::onSoftPerTryTimeout(UpstreamRequest& upstream_request) {
RetryStatus retry_status =
retry_state_->shouldHedgeRetryPerTryTimeout([this]() -> void { doRetry(); });

if (retry_status == RetryStatus::Yes && setupRetry()) {
if (retry_status == RetryStatus::Yes) {
setupRetry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, did we double count stats here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so; I found this while investigating this a few months ago. See the Fixed issue in description.

@alyssawilk
Copy link
Contributor

Also cc @AndresGuedez as I think this was on one of your milestones

@ggreenway
Copy link
Contributor Author

Huh, I would have thought this would be way more complicated.

Yeah, me too. I got this far and realized it seemed to mostly-work, and said to myself "That's it!?!"

Would definitely like you to test the heck out of this but looks good so far!

Agreed, I just wanted to make sure I wasn't way off on the wrong path before writing extensive tests; I've never touched this area of the codebase before.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that this wasn't too hard in the end. Yes this looks good to me at a high level so let's circle back when the tests are done? Thank you!

/wait

…uest

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway marked this pull request as ready for review April 15, 2020 20:55
@mattklein123
Copy link
Member

FYI looks like legit CI failure.

/wait

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

/azp run envoy-presubmit

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay tests! This PR really came out cleanly. I owe a pass on the unit tests, but I had one question on the overall flow I think I'd like us to sort out first.

config_.stats_.rq_retry_skipped_request_not_complete_.inc();
return false;
}
void Filter::setupRetry() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rename, or just do this inline - it's not really setting up anything or performing the retry.

// Overflow the request buffer limit. Because the retry base interval is infinity, no
// request will be in progress. This will cause the request to be aborted and an error
// to be returned to the client.
std::string data2(2048, 'b');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this for HTTP/2, where we do flow control via stream window, but only stop acking when we're at the limit, but for HTTP/1, we generally readDisable when we don't want more data. I'd think that if we're in a case where we were waiting on upstream, we'd want to readDisable (which would immediately cause data to cease) and we'd get the retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, and briefly looked into it, but I think it will be much more complicated. The problematic case is when the router has an upstream request, but we don't yet know whether it will need to be retried. The upstream may require more data than we can buffer to determine if it will return a 5xx or not.

Given that, we could stop flow control in the router when we do not have an upstream request. But then it's racy whether this behavior is in effect, because there are (at least) 3 states: no upstream request at all, an upstream request on a connection that isn't established yet, and an upstream request where we're sending the request currently.

So I think all that means it is out-of-scope for this PR. But worth pursuing at some point.

EXPECT_EQ(host_address_, host->address());
expectResponseTimerCreate();

Http::TestRequestHeaderMapImpl headers{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should add this to the base test class, since so many tests use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of variations on the request headers to configure different things in these tests.

I thought about writing a base class or helper function for the several similar tests I'm adding in this PR, but it was becoming more confusing to follow what each test did, even though it reduced duplication. So I'm inclined to leave it as-is.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
alyssawilk
alyssawilk previously approved these changes Apr 16, 2020
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I'd still mildly prefer setupRetry be renamed but it's a minor nit and I'm out tomorrow =P)

@ggreenway
Copy link
Contributor Author

LGTM (I'd still mildly prefer setupRetry be renamed but it's a minor nit and I'm out tomorrow =P)

I'll push a change with it removed in a few minutes (waiting for tests to run). I also found one other minor cleanup where a comment referring to setupRetry() was outdated/wrong.

Move the debug log statement into doRetry(); it's more
accurate there anyways.

Minor cleanup of one function to put all stats-related code in a
block together.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

ggreenway added 2 commits 11 minutes ago

github is showing one of the commits plus a merge; there's another commit that it isn't showing here for some reason (but does show up correctly in the "commits" tab).

@ggreenway
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #10725 (comment) was created by @ggreenway.

see: more, trace.

@ggreenway ggreenway merged commit 37e9a52 into envoyproxy:master Apr 20, 2020
penguingao pushed a commit to penguingao/envoy that referenced this pull request Apr 22, 2020
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: pengg <pengg@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setupRetry() is incorrectly called twice in router::Filter::onSoftPerTryTimeout()
4 participants