Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local_info.node() instead of bootstrap.node() whenever possible #4120

Merged
merged 7 commits into from
Aug 20, 2018
Merged

Use local_info.node() instead of bootstrap.node() whenever possible #4120

merged 7 commits into from
Aug 20, 2018

Conversation

dio
Copy link
Member

@dio dio commented Aug 13, 2018

Description:
Use local_info.node() instead of bootstrap.node() whenever possible.

Risk Level: Low
Testing: Existing
Docs Changes: N/A
Release Notes: N/A
Fixes #4085

Signed-off-by: Dhi Aurrahman dio@tetrate.io

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@dio dio changed the title Use local_info.node() instead of bootstrap.node() when possible Use local_info.node() instead of bootstrap.node() whenever possible Aug 13, 2018
@mattklein123
Copy link
Member

Test please?

@mattklein123 mattklein123 self-assigned this Aug 13, 2018
@dio
Copy link
Member Author

dio commented Aug 13, 2018

@mattklein123 yes, sorry missing some commits. I'm on it.

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@ramaraochavali
Copy link
Contributor

WDYT about adding an ASSERT while creating DiscoveryRequest which validates the existence of these two fields in the request?

@dio
Copy link
Member Author

dio commented Aug 13, 2018

@mattklein123 I feel no confidence on the change for initializing LoadStatsReporter here: https://github.com/envoyproxy/envoy/pull/4120/files#diff-982df999d2cabd23afd6c25088035bf5R304 getting node() information from local_info instead of bootstrap; this is for consistency reason only (while LoadStatsReporter doesn't use id() or cluster() value).

And I haven't found a way to check the effect of those CLI flags directly. But, since it is being forced for xDS, via Utility::checkLocalInfo, I think the change here is fine.

@dio
Copy link
Member Author

dio commented Aug 13, 2018

@ramaraochavali yeah, I'm thinking the same, what do you think about 6c263d3? Do you have any suggestions to add similar checks in other places?

@@ -211,7 +211,7 @@ ClusterManagerImpl::ClusterManagerImpl(const envoy::config::bootstrap::v2::Boots
// Now setup ADS if needed, this might rely on a primary cluster.
if (bootstrap.dynamic_resources().has_ads_config()) {
ads_mux_.reset(new Config::GrpcMuxImpl(
bootstrap.node(),
local_info.node(),
Copy link
Contributor

@ramaraochavali ramaraochavali Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add method similar to Config::Utility::checkLocalInfo("ads", local_info)
https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_mux_impl.cc#L18
and validate that node has right info?

Copy link
Member Author

@dio dio Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can. However currently, if we failed to supply the info, it firstly throws an error for "cds", which I think serves the same purpose.

[2018-08-13 17:32:35.821][2350767][critical][main] source/server/server.cc:78] error initializing configuration '/envoy/config-local.yaml': cds: setting --service-cluster and --service-node is required

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramaraochavali after an experiment, I think it is tempting to have Config::Utility::checkLocalInfo inside the GrpcMuxImpl's constructor (and testing it is easier, yeay!). However, since we want to pass in the local_info here, the implication is quite large (but mostly "replacing" node with local_info). I have the changeset ready but want to gather opinions about it.

cc. @htuch @mattklein123.

@@ -132,6 +132,10 @@ class AdsIntegrationTest : public AdsIntegrationBaseTest,
envoy::api::v2::DiscoveryRequest discovery_request;
VERIFY_ASSERTION(ads_stream_->waitForGrpcMessage(*dispatcher_, discovery_request));

ASSERT(discovery_request.has_node());
Copy link
Contributor

@ramaraochavali ramaraochavali Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think while these ASSERTs are good, they would just validate existing tests right? Can we add a test in grpc_mux_impl_test that validates bad Node passed to it along the lines of this test

TEST_F(LdsApiTest, BadLocalInfo) {

Copy link
Member Author

@dio dio Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Will try to read and add a test.

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@@ -164,6 +164,9 @@ class LoadStatsIntegrationTest : public HttpIntegrationTest,
return;
} else if (loadstats_request.cluster_stats_size() == 0) {
loadstats_request.CopyFrom(local_loadstats_request);
ASSERT(loadstats_request.has_node());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want ASSERT_TRUE to use the asserts from gtest.

@@ -301,7 +301,7 @@ ClusterManagerImpl::ClusterManagerImpl(const envoy::config::bootstrap::v2::Boots
if (cm_config.has_load_stats_config()) {
const auto& load_stats_config = cm_config.load_stats_config();
load_stats_reporter_.reset(
new LoadStatsReporter(bootstrap.node(), *this, stats,
new LoadStatsReporter(local_info.node(), *this, stats,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, not saying you have to do this, is to treat the CLI flags as a bootstrap override and override the bootstrap proto early in server config, so that local_info and bootstrap are consistent. Or, maybe nuke node from bootstrap once you've loaded up local_info. Ideally we have a single source-of-truth, and it's hard to confuse with other potential sources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it would be great to have a single source of truth here. From an interface perspective it seems better to have LocalInfo be the source of truth but I don't feel that strongly about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use local_info.

dio added 2 commits August 15, 2018 03:50
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@ramaraochavali
Copy link
Contributor

@dio Thanks for this. I like the validation in the constructor. Tests looks much cleaner. LGTM

*Protobuf::DescriptorPool::generated_pool()->FindMethodByName(
"envoy.service.discovery.v2.AggregatedDiscoveryService.StreamAggregatedResources"),
random_, time_source_),
EnvoyException, "ads: setting --service-cluster and --service-node is required");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: these error messages are misleading nowadays, since you can specify this same information in the bootstrap.

Copy link
Member Author

@dio dio Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some difficulties with crafting related test before, so I guess I need to read more on bootstrap initialization steps (server init).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should revisit the thrown error message from Config::Utility::checkLocalInfo(_, _);.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the message, not sure about its clarity and correctness.

EXPECT_CALL(dispatcher_, createTimer_(_)).WillOnce(Invoke([this](Event::TimerCb timer_cb) {
timer_cb_ = timer_cb;
timer_ = new Event::MockTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Member Author

@dio dio Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the newly introduced tests do not call setup(), and timer_ was previously initialized as new Event::MockTimer() at constructor, this makes sure no leak. Caught by LeakSanitizer in this build: https://circleci.com/gh/envoyproxy/envoy/84216.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah if we go with the other direction it will be reverted.

@@ -211,7 +211,7 @@ ClusterManagerImpl::ClusterManagerImpl(const envoy::config::bootstrap::v2::Boots
// Now setup ADS if needed, this might rely on a primary cluster.
if (bootstrap.dynamic_resources().has_ads_config()) {
ads_mux_.reset(new Config::GrpcMuxImpl(
bootstrap.node(),
local_info,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed opinions on this change. On the one hand, we make it more full proof, by forcing the LocalInfo type. On the other, we are now passing a strict superset of what we were passing before, and the new information is not used. My main thoughts on making LocalInfo canonical would be to just delete the node info from bootstrap very early in server init. But, this approach also works..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting POV, will try to explore that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily the correct PoV; just sharing where my thinking was originally :)

dio added 2 commits August 19, 2018 09:16
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's ship this as it's a definite improvement, we can bikeshed about internals later on.

@htuch htuch merged commit ef937d7 into envoyproxy:master Aug 20, 2018
snowp added a commit to snowp/envoy that referenced this pull request Aug 20, 2018
* origin/master: (34 commits)
  fuzz: fixes oss-fuzz: 9895 (envoyproxy#4189)
  docs: added developer docs for pprof/tcmalloc testing (envoyproxy#4194)
  fix sometimes returns response body to HEAD requests (envoyproxy#3985)
  admin: fix config dump order (envoyproxy#4197)
  thrift: allow translation between downstream and upstream protocols (envoyproxy#4136)
  Use local_info.node() instead of bootstrap.node() whenever possible (envoyproxy#4120)
  upstream: allow extension protocol options to be used with http filters (envoyproxy#4165)
  [thrift_proxy] Replace local HeaderMap with Http::HeaderMap[Impl] (envoyproxy#4169)
  docs: improve gRPC-JSON filter doc (envoyproxy#4184)
  stats: refactoring MetricImpl without strings (envoyproxy#4190)
  fuzz: coverage profile generation instructions. (envoyproxy#4193)
  upstream: rebuild cluster when health check config is changed (envoyproxy#4075)
  build: use clang-6.0. (envoyproxy#4168)
  thrift_proxy: introduce header transport (envoyproxy#4082)
  tcp: allow connection pool callers to store protocol state (envoyproxy#4131)
  configs: match latest API changes (envoyproxy#4185)
  Fix wrong mock function name. (envoyproxy#4187)
  Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182)
  Converting envoy configs to V2 (envoyproxy#2957)
  Add timestamp to HealthCheckEvent definition (envoyproxy#4119)
  ...

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants