Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[thrift][bugfix]try to fix thrift request overflow crash. #12890

Merged
merged 7 commits into from
Sep 10, 2020
Merged

[thrift][bugfix]try to fix thrift request overflow crash. #12890

merged 7 commits into from
Sep 10, 2020

Conversation

pyrl247
Copy link
Contributor

@pyrl247 pyrl247 commented Aug 31, 2020

try to fix thrift request overflow crash.When request overflow ,the OnPoolFailure will set host to null,cause the crash in this case.

Signed-off-by: Guang Yang pyrl247@gmail.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

@pyrl247 pyrl247 requested a review from zuercher as a code owner August 31, 2020 14:00
PoolFailure will set host to null,cause the crush in this case.

Signed-off-by: Guang Yang <pyrl247@gmail.com>
@zuercher
Copy link
Member

zuercher commented Sep 1, 2020

Thanks. You’ll need to fix the code formatting (if you click through to the details of the failed build you can eventually drill down to the log file that shows what’s formatted incorrectly, or you can install clang-format and use the tools/code_format/check_format.py directly).

Also, let’s add a test to validate the fix.

Signed-off-by: Guang Yang <pyrl247@gmail.com>
@pyrl247
Copy link
Contributor Author

pyrl247 commented Sep 1, 2020

I didn't found where to add the test case,Could you please tell me where to add it ?

@rgs1
Copy link
Member

rgs1 commented Sep 1, 2020

Thanks for the fix @pyrl247 !

Here's a similar fix with test cases added: #6549.

Let me know if that helps.

@rgs1
Copy link
Member

rgs1 commented Sep 1, 2020

For the test case, I think you can probably use this existing one as an example:

TEST_F(ThriftRouterTest, PoolOverflowFailure) {
  initializeRouter();

  startRequest(MessageType::Call);

  EXPECT_CALL(callbacks_, sendLocalReply(_, _))
      .WillOnce(Invoke([&](const DirectResponse& response, bool end_stream) -> void {
        auto& app_ex = dynamic_cast<const AppException&>(response);
        EXPECT_EQ(AppExceptionType::InternalError, app_ex.type_);
        EXPECT_THAT(app_ex.what(), ContainsRegex(".*too many connections.*"));
        EXPECT_TRUE(end_stream);
      }));
  context_.cluster_manager_.tcp_conn_pool_.poolFailure(ConnectionPool::PoolFailureReason::Overflow);
}

and add something like:

  EXPECT_EQ(FilterStatus::StopIteration, router_->messageBegin(metadata_));

at the end... (which should crash without your fix).

I think something along those lines would work.

AppExceptionType::InternalError,
fmt::format("too many connections to '{}'", upstream_host_->address()->asString())),
AppException(AppExceptionType::InternalError,
fmt::format("thrift upstream request: too many connections")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Since you've removed the string interpolation, you don't need to call fmt::format here.
  2. There is another case below that uses upstream_host_ below. That should be modified as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the Timeout case? When it hits that case ,as far as I can see the upstream_host_ will keep the real one.Could you please check if it's right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should depend on that behavior even if it's true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is better to report more detail about the Exception.How about change like this
`

  parent_.callbacks_->sendLocalReply(
      AppException(
          AppExceptionType::InternalError,
          fmt::format("connection failure '{}'", (upstream_host_ != nullptr)
                                                     ? upstream_host_->address()->asString()
                                                     : "to upstream")),
      true);

`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@pyrl247
Copy link
Contributor Author

pyrl247 commented Sep 3, 2020

@rgs1 What about I change the test/mocks/tcp/mocks.cc the poolFailure function like this?It will trigger the crash.

  Callbacks* cb = callbacks_.front();
  callbacks_.pop_front();
  handles_.pop_front();
  if (reason == ConnectionPool::PoolFailureReason::Overflow) {
    cb->onPoolFailure(reason, nullptr);
  } else {
    cb->onPoolFailure(reason, host_);
  }
}

@rgs1
Copy link
Member

rgs1 commented Sep 3, 2020

@rgs1 What about I change the test/mocks/tcp/mocks.cc the poolFailure function like this?It will trigger the crash.

  Callbacks* cb = callbacks_.front();
  callbacks_.pop_front();
  handles_.pop_front();
  if (reason == ConnectionPool::PoolFailureReason::Overflow) {
    cb->onPoolFailure(reason, nullptr);
  } else {
    cb->onPoolFailure(reason, host_);
  }
}

That probably works too.

Signed-off-by: Guang Yang <pyrl247@gmail.com>
Signed-off-by: Guang Yang <pyrl247@gmail.com>
@pyrl247
Copy link
Contributor Author

pyrl247 commented Sep 6, 2020

@zuercher @rgs1 Please check if it work?

@@ -407,7 +407,8 @@ TEST_F(ThriftRouterTest, PoolOverflowFailure) {
EXPECT_THAT(app_ex.what(), ContainsRegex(".*too many connections.*"));
EXPECT_TRUE(end_stream);
}));
context_.cluster_manager_.tcp_conn_pool_.poolFailure(ConnectionPool::PoolFailureReason::Overflow);
context_.cluster_manager_.tcp_conn_pool_.poolFailureWithNullHost(
ConnectionPool::PoolFailureReason::Overflow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified that this crashes without your fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,if i use the previous code ,it will crash in my ci like this .

[ RUN      ] ThriftRouterTest.PoolOverflowFailure
TestRandomGenerator running with seed -443243904
[2020-09-09 13:03:11.969][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
[2020-09-09 13:03:11.969][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2020-09-09 13:03:11.969][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 0/1.16.0-dev/test/RELEASE/BoringSSL
[2020-09-09 13:03:11.969][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #0: __restore_rt [0x7faf6ee3a8a0]
[2020-09-09 13:03:11.975][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: Envoy::Tcp::ConnectionPool::MockInstance::poolFailure() [0x9b3dd3]
[2020-09-09 13:03:11.981][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::Extensions::NetworkFilters::ThriftProxy::Router::ThriftRouterTest_PoolOverflowFailure_Test::TestBody() [0x788b44]
[2020-09-09 13:03:11.987][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: testing::internal::HandleExceptionsInMethodIfSupported<>() [0x1293a78]
[2020-09-09 13:03:11.992][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: testing::Test::Run() [0x12939a5]
[2020-09-09 13:03:11.998][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: testing::TestInfo::Run() [0x12947f0]
[2020-09-09 13:03:12.003][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: testing::TestSuite::Run() [0x1295177]
[2020-09-09 13:03:12.009][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: testing::internal::UnitTestImpl::RunAllTests() [0x12a2047]
[2020-09-09 13:03:12.015][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: testing::internal::HandleExceptionsInMethodIfSupported<>() [0x12a1968]
[2020-09-09 13:03:12.020][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #9: testing::UnitTest::Run() [0x12a17ef]
[2020-09-09 13:03:12.026][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #10: Envoy::TestRunner::RunTests() [0xbf1734]
[2020-09-09 13:03:12.031][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #11: main [0xbf092b]
[2020-09-09 13:03:12.032][5454][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #12: __libc_start_main [0x7faf6ea58b97]
================================================================================
INFO: Elapsed time: 1017.162s, Critical Path: 329.15s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@@ -59,6 +59,7 @@ class MockInstance : public Instance {

Envoy::ConnectionPool::MockCancellable* newConnectionImpl(Callbacks& cb);
void poolFailure(PoolFailureReason reason);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably just add a bool host_null param and reuse poolFailure()

Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Guang Yang <pyrl247@gmail.com>
Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks. Just one last nit.

Signed-off-by: Guang Yang <pyrl247@gmail.com>
rgs1
rgs1 previously approved these changes Sep 9, 2020
Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I went ahead and fix the merge conflict on the docs.

@zuercher zuercher merged commit 329857f into envoyproxy:master Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants