Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Make AGGREGATED_DELTA_GRPC set is_aggregated on the Subscription #25055

Merged
merged 4 commits into from
Jan 27, 2023

Conversation

abeyad
Copy link
Contributor

@abeyad abeyad commented Jan 20, 2023

Currently, there's a bug when the ApiConfigSource is AGGREGATED_DELTA_GRPC and xdstp is used: The
GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to false, instead of setting it to true. This causes the GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start, instead of waiting to share a gRPC stream like ADS is supposed to.

Unfortunately, this behavior was masked by the integration tests, because we use a FakeUpstream in the integration tests that just contains a single xDS stream to which the Envoy instances connect to.

The only way to verify this behavior was to look at the logs when running the XdsTpAdsIntegrationTest: when is_aggregated was set to false (prior to this commit), the logs would contain GrpcStream messages saying the stream already exists. After changing is_aggregated to true, the logs no longer output stream "already exists", because we don't attempt to create multiple streams, which is the correct behavior.

Signed-off-by: Ali Beyad abeyad@google.com

Currently, there's a bug when the ApiConfigSource is
AGGREGATED_DELTA_GRPC and xdstp is used: The
GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to
false, instead of setting it to true.  This causes the
GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start,
instead of waiting to share a gRPC stream like ADS is supposed to.

Unfortunately, this behavior was masked by the integration tests,
because we use a FakeUpstream in the integration tests that just
contains a single xDS stream to which the Envoy instances connect to.

The only way to verify this behavior was to look at the logs when
running the XdsTpAdsIntegrationTest: when is_aggregated was set to false
(prior to this commit), the logs would contain GrpcStream messages
saying the stream already exists.  After changing is_aggregated to true,
the logs no longer output stream "already exists", because we don't
attempt to create multiple streams, which is the correct behavior.

Signed-off-by: Ali Beyad <abeyad@google.com>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.
Will it be possible to add a test for this (preventing regression)?
I wonder if EXPECT_LOG_NOT_CONTAINS can be used in this case.
/wait-any

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Jan 20, 2023

Thanks, LGTM. Will it be possible to add a test for this (preventing regression)? I wonder if EXPECT_LOG_NOT_CONTAINS can be used in this case. /wait-any

Thanks for the suggestion! I was wondering how to test this and didn't know about EXPECT_LOG_NOT_CONTAINS. I added a check using it. PTAL.

@adisuissa
Copy link
Contributor

Thanks for the suggestion! I was wondering how to test this and didn't know about EXPECT_LOG_NOT_CONTAINS. I added a check using it. PTAL.

Not sure if that's the best test, because it will pass if the test doesn't output anything, but I can't think of any other way to test this.
Please just verify that the test fails if you change is_aggregated back to its old value, but other than that LGTM.
Thanks!

adisuissa
adisuissa previously approved these changes Jan 20, 2023
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
/assign @htuch
Assigning Harvey for a senior-maintainer stamp.

@abeyad
Copy link
Contributor Author

abeyad commented Jan 20, 2023

Please just verify that the test fails if you change is_aggregated back to its old value

Yup, the test fails when is_aggregated is set back to false.

Agree I wish there was a better way to test this; perhaps we could add a metric, but I'm not sure it makes sense to add a metric just for the purposes of test inspection, though we could do that if you prefer.

Thanks for reviewing!

@abeyad
Copy link
Contributor Author

abeyad commented Jan 20, 2023

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #25055 (comment) was created by @abeyad.

see: more, trace.

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Jan 23, 2023

@htuch friendly ping on this, thanks!

AdsIntegrationTest::initialize();
// Verifies that GrpcStream::establishNewStream() doesn't already have a stream created. The
// ADS protocol does not try to create multiple streams for each resource type.
EXPECT_LOG_NOT_CONTAINS("warning", "already exists!", AdsIntegrationTest::initialize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think we can do better than rely on fragile log warnings. Can we use stats or the like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried looking for an existing stat to see if we could use that, but I don't think there is any. The is_aggregated variable controls https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_subscription_impl.cc#L50, which is a call to the grpc mux's start(). Different mux implementations have different methods, but they all effectively call GrpcStream::establishNewStream(): https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L60.

This log check is in the GrpcStream::establishNewStream() function. GrpcStream::establishNewStream() logs that we already have a stream and exits, preventing the creation of a new stream, so a safeguard already exists, but if is_aggregated was set to true, we wouldn't even get to this point.

We could introduce a new stat, though I'm not sure what the precedence is for adding a new stat for test inspection purposes only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think counting the number of streams established for control plane purposes seems a useful thing in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the streams_active (doc) counter can be used in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, even without my change, the number of streams doesn't increase, because there's a check in GrpcStream::establishNewStream() that returns early if the stream is not null: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L63. There already exists a counter a little further down that increments the number of connected streams if the stream is created: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L76. However, we don't get to that point due to the aforementioned check. So in reality, GrpcStream::establishNewStream() has an extra safeguard in place to prevent the creation of an extra stream, but to me it seems the intended behavior of is_aggregated in GrpcSubscriptionImpl is to prevent us from even getting to that point: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_subscription_impl.cc#L50.

So this PR is not so much a bug fix as it is to bring the is_aggregated value in the creation of an ADS subscription to the correct semantics, so we don't need to rely on the check in GrpcStream::establishNewStream().

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yeah. Still, I think relying on logs here is really fragile, there's nothing to stop this log message being moved etc.

Should this be a unit test instead where you can mock and provide a more precise test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good point, changed to a unit test, which is also much simpler as the plumbing was already in place to do that. I verified that the test fails without my change and the test passes with my change. PTAL

Signed-off-by: Ali Beyad <abeyad@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit 765072a into envoyproxy:main Jan 27, 2023
VishalDamgude pushed a commit to freshworks/envoy that referenced this pull request Feb 2, 2023
…envoyproxy#25055)

Currently, there's a bug when the ApiConfigSource is AGGREGATED_DELTA_GRPC and xdstp is used: The
GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to false, instead of setting it to true. This causes the GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start, instead of waiting to share a gRPC stream like ADS is supposed to.

Unfortunately, this behavior was masked by the integration tests, because we use a FakeUpstream in the integration tests that just contains a single xDS stream to which the Envoy instances connect to.

The only way to verify this behavior was to look at the logs when running the XdsTpAdsIntegrationTest: when is_aggregated was set to false (prior to this commit), the logs would contain GrpcStream messages saying the stream already exists. After changing is_aggregated to true, the logs no longer output stream "already exists", because we don't attempt to create multiple streams, which is the correct behavior.

Signed-off-by: Ali Beyad <abeyad@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants