-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: Make AGGREGATED_DELTA_GRPC set is_aggregated on the Subscription #25055
Conversation
Currently, there's a bug when the ApiConfigSource is AGGREGATED_DELTA_GRPC and xdstp is used: The GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to false, instead of setting it to true. This causes the GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start, instead of waiting to share a gRPC stream like ADS is supposed to. Unfortunately, this behavior was masked by the integration tests, because we use a FakeUpstream in the integration tests that just contains a single xDS stream to which the Envoy instances connect to. The only way to verify this behavior was to look at the logs when running the XdsTpAdsIntegrationTest: when is_aggregated was set to false (prior to this commit), the logs would contain GrpcStream messages saying the stream already exists. After changing is_aggregated to true, the logs no longer output stream "already exists", because we don't attempt to create multiple streams, which is the correct behavior. Signed-off-by: Ali Beyad <abeyad@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM.
Will it be possible to add a test for this (preventing regression)?
I wonder if EXPECT_LOG_NOT_CONTAINS
can be used in this case.
/wait-any
Signed-off-by: Ali Beyad <abeyad@google.com>
Thanks for the suggestion! I was wondering how to test this and didn't know about EXPECT_LOG_NOT_CONTAINS. I added a check using it. PTAL. |
Not sure if that's the best test, because it will pass if the test doesn't output anything, but I can't think of any other way to test this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
/assign @htuch
Assigning Harvey for a senior-maintainer stamp.
Yup, the test fails when is_aggregated is set back to false. Agree I wish there was a better way to test this; perhaps we could add a metric, but I'm not sure it makes sense to add a metric just for the purposes of test inspection, though we could do that if you prefer. Thanks for reviewing! |
/retest |
Retrying Azure Pipelines: |
Signed-off-by: Ali Beyad <abeyad@google.com>
@htuch friendly ping on this, thanks! |
AdsIntegrationTest::initialize(); | ||
// Verifies that GrpcStream::establishNewStream() doesn't already have a stream created. The | ||
// ADS protocol does not try to create multiple streams for each resource type. | ||
EXPECT_LOG_NOT_CONTAINS("warning", "already exists!", AdsIntegrationTest::initialize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I think we can do better than rely on fragile log warnings. Can we use stats or the like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried looking for an existing stat to see if we could use that, but I don't think there is any. The is_aggregated
variable controls https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_subscription_impl.cc#L50, which is a call to the grpc mux's start(). Different mux implementations have different methods, but they all effectively call GrpcStream::establishNewStream()
: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L60.
This log check is in the GrpcStream::establishNewStream()
function. GrpcStream::establishNewStream()
logs that we already have a stream and exits, preventing the creation of a new stream, so a safeguard already exists, but if is_aggregated
was set to true, we wouldn't even get to this point.
We could introduce a new stat, though I'm not sure what the precedence is for adding a new stat for test inspection purposes only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counting the number of streams established for control plane purposes seems a useful thing in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the streams_active
(doc) counter can be used in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, even without my change, the number of streams doesn't increase, because there's a check in GrpcStream::establishNewStream() that returns early if the stream is not null: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L63. There already exists a counter a little further down that increments the number of connected streams if the stream is created: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_stream.h#L76. However, we don't get to that point due to the aforementioned check. So in reality, GrpcStream::establishNewStream() has an extra safeguard in place to prevent the creation of an extra stream, but to me it seems the intended behavior of is_aggregated
in GrpcSubscriptionImpl is to prevent us from even getting to that point: https://github.com/envoyproxy/envoy/blob/main/source/common/config/grpc_subscription_impl.cc#L50.
So this PR is not so much a bug fix as it is to bring the is_aggregated
value in the creation of an ADS subscription to the correct semantics, so we don't need to rely on the check in GrpcStream::establishNewStream()
.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yeah. Still, I think relying on logs here is really fragile, there's nothing to stop this log message being moved etc.
Should this be a unit test instead where you can mock and provide a more precise test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good point, changed to a unit test, which is also much simpler as the plumbing was already in place to do that. I verified that the test fails without my change and the test passes with my change. PTAL
Signed-off-by: Ali Beyad <abeyad@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…envoyproxy#25055) Currently, there's a bug when the ApiConfigSource is AGGREGATED_DELTA_GRPC and xdstp is used: The GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to false, instead of setting it to true. This causes the GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start, instead of waiting to share a gRPC stream like ADS is supposed to. Unfortunately, this behavior was masked by the integration tests, because we use a FakeUpstream in the integration tests that just contains a single xDS stream to which the Envoy instances connect to. The only way to verify this behavior was to look at the logs when running the XdsTpAdsIntegrationTest: when is_aggregated was set to false (prior to this commit), the logs would contain GrpcStream messages saying the stream already exists. After changing is_aggregated to true, the logs no longer output stream "already exists", because we don't attempt to create multiple streams, which is the correct behavior. Signed-off-by: Ali Beyad <abeyad@google.com>
Currently, there's a bug when the ApiConfigSource is AGGREGATED_DELTA_GRPC and xdstp is used: The
GrpcCollectionSubscriptionImpl that gets created sets is_aggregated to false, instead of setting it to true. This causes the GrpcSubscriptionImpl instance to attempt to start the GrpcMux on start, instead of waiting to share a gRPC stream like ADS is supposed to.
Unfortunately, this behavior was masked by the integration tests, because we use a FakeUpstream in the integration tests that just contains a single xDS stream to which the Envoy instances connect to.
The only way to verify this behavior was to look at the logs when running the XdsTpAdsIntegrationTest: when is_aggregated was set to false (prior to this commit), the logs would contain GrpcStream messages saying the stream already exists. After changing is_aggregated to true, the logs no longer output stream "already exists", because we don't attempt to create multiple streams, which is the correct behavior.
Signed-off-by: Ali Beyad abeyad@google.com