Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for explicit wildcard resource #16855

Merged
merged 59 commits into from
Oct 21, 2021
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
719661c
docs: Add an explicit wildcard mode
krnowak Jun 7, 2021
9b84dfd
config: Implement explicit wildcard mode in delta gRPC
krnowak Jun 7, 2021
a5f4ed7
test: Add tests for explicit wildcar mode in delta gRPC
krnowak Jun 7, 2021
d78ec97
docs: Reword wildcard mode docs
krnowak Jun 7, 2021
2e8ba91
config: Fix build
krnowak Jun 7, 2021
f4ca629
config: Fix build
krnowak Jun 8, 2021
c008bcc
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Jul 2, 2021
3fa6a37
Drop the entry in version history
krnowak Jul 2, 2021
3914615
Always send an asterisk as a wildcard subscription
krnowak Jul 6, 2021
1d53769
Better comments, drop redundant tests
krnowak Jul 6, 2021
fa83949
Add constants for wildcards
krnowak Jul 6, 2021
6748f53
Test fixes
krnowak Jul 6, 2021
02e5370
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Jul 7, 2021
872e737
Revert "Test fixes"
krnowak Jul 8, 2021
502de45
Revert "Better comments, drop redundant tests"
krnowak Jul 8, 2021
4ba0fc4
Revert "Always send an asterisk as a wildcard subscription"
krnowak Jul 8, 2021
4e34600
New attempt at implementing new wildcard subscriptions
krnowak Jul 8, 2021
98c408b
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Jul 14, 2021
18e9f70
Formatting fixes
krnowak Jul 14, 2021
d6e273f
Update wildcard handling in xds delta subscription state
krnowak Jul 14, 2021
db884fe
Change Wildcard into constexpr string view
krnowak Jul 16, 2021
9339a0f
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Aug 10, 2021
cb44431
Rework legacy wildcard subscription handling
krnowak Aug 5, 2021
fcc53aa
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Aug 12, 2021
9a1855d
Fix typos
krnowak Aug 12, 2021
e71317f
Rename function to placate clang tidy
krnowak Aug 12, 2021
cb93839
Try to improve coverage
krnowak Aug 13, 2021
b7dee94
Factor out legacy wildcard checks to a separate function
krnowak Aug 13, 2021
61361b8
Document the tests
krnowak Aug 13, 2021
63bba65
Fix formatting
krnowak Aug 13, 2021
da12ed5
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Aug 31, 2021
e249133
Fix build after merge
krnowak Aug 31, 2021
88587b0
Get rid of containerContains
krnowak Aug 31, 2021
da99fc7
Fix formatting
krnowak Sep 1, 2021
d9f0803
Document the resource state machine
krnowak Sep 1, 2021
74532e9
Ignore resources that we did not request
krnowak Sep 3, 2021
3d55048
Add a test for ignoring superfluous resources
krnowak Sep 3, 2021
f8b758f
Fix formatting
krnowak Sep 3, 2021
fd76a69
Drop one assert
krnowak Sep 9, 2021
4d20de1
Try to preserve the legacy wildcard status on ineffective unsubscript…
krnowak Sep 9, 2021
ab65125
Expand a bit more on the ambiguous resource category in comments
krnowak Sep 17, 2021
c726d04
Update the comments in the xds_mux variant too
krnowak Sep 20, 2021
f9d846d
Constify some variables
krnowak Sep 28, 2021
bae8643
Drop unused member
krnowak Sep 28, 2021
d10ec21
Gate the explicit wildcard resource support
krnowak Oct 1, 2021
508fc9c
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Oct 1, 2021
726d655
Fixes
krnowak Oct 1, 2021
649af01
Build fixes
krnowak Oct 4, 2021
6f4724b
Test the old implementation too
krnowak Oct 4, 2021
7d78a00
docs: Add a note about xds changes
krnowak Oct 4, 2021
93ec187
Fix clang tidy
krnowak Oct 5, 2021
52a66ee
Run ADS integration tests together with old and new DSS
krnowak Oct 5, 2021
044bceb
Add DSS to dictionary
krnowak Oct 5, 2021
559e50a
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Oct 5, 2021
9a34a6b
Fix version history after merge
krnowak Oct 5, 2021
5e9a130
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Oct 19, 2021
36b872b
test: Fix redis ADS integration test params
krnowak Oct 19, 2021
5a2aac2
Merge remote-tracking branch 'origin/main' into krnowak/explicit-wild…
krnowak Oct 20, 2021
55dfdfa
Shard the redis integration test to avoid timeouts
krnowak Oct 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions source/common/config/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,7 @@ envoy_cc_library(
hdrs = ["watch_map.h"],
deps = [
":decoded_resource_lib",
":utility_lib",
":xds_resource_lib",
"//envoy/config:subscription_interface",
"//source/common/common:assert_lib",
Expand Down
247 changes: 208 additions & 39 deletions source/common/config/delta_subscription_state.cc

Large diffs are not rendered by default.

89 changes: 72 additions & 17 deletions source/common/config/delta_subscription_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,59 @@ namespace Config {
// There can be multiple DeltaSubscriptionStates active. They will always all be
// blissfully unaware of each other's existence, even when their messages are
// being multiplexed together by ADS.
//
// There are two scenarios which affect how DeltaSubscriptionState manages the resources. First
// scenario is when we are subscribed to a wildcard resource, and other scenario is when we are not.
//
// Delta subscription state also divides the resources it cached into three categories: requested,
// wildcard and ambiguous.
//
// The "requested" category is for resources that we have explicitly asked for (either through the
// initial set of resources or through the on-demand mechanism). Resources in this category are in
// one of two states: "complete" and "waiting for server".
//
// "Complete" resources are resources about which the server sent us the information we need (for
// now - just resource version).
//
// The "waiting for server" state is either for resources that we have just requested, but we still
// didn't receive any version information from the server, or for the "complete" resources that,
// according to the server, are gone, but we are still interested in them - in such case we strip
// the information from the resource.
//
// The "wildcard" category is for resources that we are not explicitly interested in, but we are
// indirectly interested through the subscription to the wildcard resource.
//
// The "ambiguous" category is for resources that we stopped being interested in, but we may still
// be interested indirectly through the wildcard subscription - resources in these category are
// "waiting" for the config server to confirm their status.
//
// Please refer to drawings (non-wildcard-resource-state-machine.png and
// (wildcard-resource-state-machine.png) for visual depictions of the resource state machine.
//
// In the "no wildcard subscription" scenario all the cached resources should be in the "requested"
// category. Resources are added to the category upon the explicit request and dropped when we
// explicitly unsubscribe from it. Transitions between "complete" and "waiting for server" happen
// when we receive messages from the server - if a resource in the message is in "added resources"
// list (thus contains version information), the resource becomes "complete". If the resource in the
// message is in "removed resources" list, it changes into the "waiting for server" state. If a
// server sends us a resource that we didn't request, it's going to be ignored.
//
// In the "wildcard subscription" scenario, "requested" category is the same as in "no wildcard
// subscription" scenario, with one exception - the unsubscribed "complete" resource is not removed
// from the cache, but it's moved to the "ambiguous" resources instead. At this point we are waiting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little sub-optimal, because it means that if a client is subscribed to the wildcard and to another explicit resource, if it unsubscribes from that explicit resource, it cannot remove the resource from its cache unless the server explicitly sends a response with the resource in the removed_resources list. This could lead to the cache growing without bound.

Maybe we should add something to the xDS spec that requires the delta server to send the resource name in removed_resources in response to a client unsubscription, but only if the client is subscribed to the wildcard.

@htuch, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little sub-optimal, because it means that if a client is subscribed to the wildcard and to another explicit resource, if it unsubscribes from that explicit resource, it cannot remove the resource from its cache unless the server explicitly sends a response with the resource in the removed_resources list. This could lead to the cache growing without bound.

Maybe we should add something to the xDS spec that requires the delta server to send the resource name in removed_resources in response to a client unsubscription, but only if the client is subscribed to the wildcard.

@htuch, WDYT?

Yeah, my assumption was that in such case (we are subscribed to wildcard and just unsubscribed from some explicit resource) the server will send us a response with the explicit resource in added_resources if the explicit resource is also a part of the wildcard set (so we would move the resource from ambiguous to wildcard) or otherwise in removed_resources if the explicit resource is not a part of the wildcard set (so we would drop the resource from the cache).

But yeah, probably it's best to add such a wording to xDS spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server will send us a response with the explicit resource in added_resources if the explicit resource is also a part of the wildcard set (so we would move the resource from ambiguous to wildcard)

As mentioned above -- why cache a resource if we receive a complete copy from the server anyway?

But yeah, probably it's best to add such a wording to xDS spec.

I don't think this is current behaviour and it needs to be documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It seems inefficient to require the server to explicitly resend the resource if it is included in the wildcard, so I wasn't thinking it would do that. But I guess the client would need some signal to tell it to remove the resource from the "ambiguous" list in this case.

I've put together #17983 to update the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitri-d In the incremental protocol variant, the server does not need to resend any individual resource unless that resource has changed, so we won't necessarily receive a complete copy from the server anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised: This change might break current server implementations b/c of difference in handling of ttl expirations vs. unsubscribes.

Could you expand on it, please? Right now I'm not sure if and how to address this. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the problem is with TTLs here. Can you say more about that?

With regard to the broader issue, I agree that the presence of "ambiguous" state is fairly ugly. But I think the root cause of the ugliness is actually the wildcard subscription semantics, which were not designed to coexist with per-resource subscriptions and do not provide any way for the client to know which returned resource is actually associated with the wildcard subscription. Given that inherent wire-level protocol problem, I don't see any choice other than to mandate a certain behavior for clients.

I will also note two things that I think make this more tolerable. First, the new xdstp: naming scheme replaces wildcard subscriptions with collection resources, and collection resources do not have the above problem: the wire protocol clearly indicates which returned resources are associated with which collection, so there is no ambiguity. This means that this problem will go away as we migrate to the new naming scheme.

Second, even for legacy clients and servers, this problem comes up only in a case where the client is subscribing to both the wildcard and to other resources on the same stream. In general, that will occur only when something like on-demand CDS is used, which means that clients that don't implement on-demand CDS will not need to worry about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when a resource ttl expires, the resource gets deleted. It happens regardless of whether the resource is a part of wildcard subscription or not. IIRC the original conversation, the assumption was that the server will just resend a resource if the server missed an update.

With the changes in this PR, the client keeps the state for resources some of the time now. There isn't a way for the server to distinguish between unsubscribe request due to a ttl expiration from a regular unsubscribe based on data present in discovery request. As we expect different responses from the server (a full resource in case of ttl expiration, a resource name otherwise), the server is now required to distinguish between the two based on its internal state. which wasn't a requirement before. b/c of the above the changes in this PR would require a corresponding update in server implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the client actually generates an unsubscribe request when a TTL expires. A TTL is basically a server's way of saying "consider the resource to be removed at a particular time without my having to explicitly tell you that it's been removed". But a server indicating that a resource has been removed does that alter the fact that a client is subscribing to it. In effect, you can think of "does not exist" as just another possible value for a given resource; while a client is subscribed to a resource, the server is obligated to send updates whenever the value of that resource changes, so "resource previously did exist and now does not" and "resource previously did not exist and now does" are both just special cases of the resource's value changing.

A server that sends a resource with a TTL knows that the client will consider the resource to not exist when the TTL expires. If the server wants the client to continue using the resource after that, it needs to resend the resource with a new TTL value before the original TTL expires. But this should not be hard for the server to figure out, because the server knows that it sent the resource with a TTL in the first place. And I think this behavior is required independently of the change we're talking about here.

In the change we're talking about here, all we're saying is that a server that supports both wildcard and non-wildcard subscriptions on the same stream needs to know that if the client unsubscribes to a non-wildcard resource, the server needs to explicitly indicate whether that resource is still relevant to the client because of the wildcard or whether the client can remove the resource. I think this change is completely independent of TTL behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the client actually generates an unsubscribe request when a TTL expires.

Duh, indeed there's no unsubscribe request on ttl expiration. Ignore what I said.

// for the server to tell us that this resource should be either moved to the "wildcard" resources,
// or dropped. Resources in "wildcard" category are only added there or dropped from there by the
// server. Resources from both "wildcard" and "ambiguous" categories can become "requested"
// "complete" resources if we subscribe to them again.
//
// The delta subscription state transitions between the two scenarios depending on whether we are
// subscribed to wildcard resource or not. Nothing special happens when we transition from "no
// wildcard subscription" to "wildcard subscription" scenario, but when transitioning in the other
// direction, we drop all the resources in "wildcard" and "ambiguous" categories.
class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> {
public:
DeltaSubscriptionState(std::string type_url, UntypedConfigUpdateCallbacks& watch_map,
const LocalInfo::LocalInfo& local_info, Event::Dispatcher& dispatcher,
const bool wildcard);
const LocalInfo::LocalInfo& local_info, Event::Dispatcher& dispatcher);

// Update which resources we're interested in subscribing to.
void updateSubscriptionInterest(const absl::flat_hash_set<std::string>& cur_added,
Expand Down Expand Up @@ -60,15 +108,21 @@ class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> {

class ResourceState {
public:
ResourceState(const envoy::service::discovery::v3::Resource& resource)
: version_(resource.version()) {}

// Builds a ResourceState in the waitingForServer state.
ResourceState() = default;
// Builds a ResourceState with a specific version
ResourceState(absl::string_view version) : version_(version) {}
// Self-documenting alias of default constructor.
static ResourceState waitingForServer() { return ResourceState(); }
// Self-documenting alias of constructor with version.
static ResourceState withVersion(absl::string_view version) { return ResourceState(version); }

// If true, we currently have no version of this resource - we are waiting for the server to
// provide us with one.
bool waitingForServer() const { return version_ == absl::nullopt; }
bool isWaitingForServer() const { return version_ == absl::nullopt; }

void setAsWaitingForServer() { version_ = absl::nullopt; }
void setVersion(absl::string_view version) { version_ = std::string(version); }

// Must not be called if waitingForServer() == true.
std::string version() const {
Expand All @@ -80,36 +134,37 @@ class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> {
absl::optional<std::string> version_;
};

// Use these helpers to ensure resource_state_ and resource_names_ get updated together.
void addResourceState(const envoy::service::discovery::v3::Resource& resource);
void setResourceWaitingForServer(const std::string& resource_name);
void removeResourceState(const std::string& resource_name);
void addResourceStateFromServer(const envoy::service::discovery::v3::Resource& resource);
OptRef<ResourceState> getRequestedResourceState(absl::string_view resource_name);
OptRef<const ResourceState> getRequestedResourceState(absl::string_view resource_name) const;

void populateDiscoveryRequest(envoy::service::discovery::v3::DeltaDiscoveryResponse& request);
bool isInitialRequestForLegacyWildcard();

// A map from resource name to per-resource version. The keys of this map are exactly the resource
// names we are currently interested in. Those in the waitingForServer state currently don't have
// any version for that resource: we need to inform the server if we lose interest in them, but we
// also need to *not* include them in the initial_resource_versions map upon a reconnect.
absl::node_hash_map<std::string, ResourceState> resource_state_;
absl::node_hash_map<std::string, ResourceState> requested_resource_state_;
// A map from resource name to per-resource version. The keys of this map are resource names we
// have received as a part of the wildcard subscription.
absl::node_hash_map<std::string, std::string> wildcard_resource_state_;
// Used for storing resources that we lost interest in, but could
// also be a part of wildcard subscription.
absl::node_hash_map<std::string, std::string> ambiguous_resource_state_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting back to review here (sorry for delay). I was a little surprised by the ambiguous_resource_state_ and then I read your explanation and it made sense. Do you think you could document the resource state machine here or somewhere else for posterity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a description of the state machine in DeltaSubscriptionState class docs. Please let me know if they are any good. I also added two pictures. I wasn't sure where to keep them, so currently they are also in source/common/config directory. Pasting them here inline too:

non wildcard resource state machine:
non-wildcard-resource-state-machine

wildcard resource state machine:
wildcard-resource-state-machine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These diagrams look great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks so much.


// Not all xDS resources supports heartbeats due to there being specific information encoded in
// an empty response, which is indistinguishable from a heartbeat in some cases. For now we just
// disable heartbeats for these resources (currently only VHDS).
const bool supports_heartbeats_;
TtlManager ttl_;
// The keys of resource_versions_. Only tracked separately because std::map does not provide an
// iterator into just its keys.
absl::flat_hash_set<std::string> resource_names_;

const std::string type_url_;
// Is the subscription is for a wildcard request.
const bool wildcard_;
UntypedConfigUpdateCallbacks& watch_map_;
const LocalInfo::LocalInfo& local_info_;
Event::Dispatcher& dispatcher_;
std::chrono::milliseconds init_fetch_timeout_;

bool in_initial_legacy_wildcard_{true};
bool any_request_sent_yet_in_current_stream_{};
bool must_send_discovery_request_{};

Expand Down
11 changes: 5 additions & 6 deletions source/common/config/new_grpc_mux_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ GrpcMuxWatchPtr NewGrpcMuxImpl::addWatch(const std::string& type_url,
auto entry = subscriptions_.find(type_url);
if (entry == subscriptions_.end()) {
// We don't yet have a subscription for type_url! Make one!
addSubscription(type_url, options.use_namespace_matching_, resources.empty());
addSubscription(type_url, options.use_namespace_matching_);
return addWatch(type_url, resources, callbacks, resource_decoder, options);
}

Expand Down Expand Up @@ -230,11 +230,10 @@ void NewGrpcMuxImpl::removeWatch(const std::string& type_url, Watch* watch) {
entry->second->watch_map_.removeWatch(watch);
}

void NewGrpcMuxImpl::addSubscription(const std::string& type_url, const bool use_namespace_matching,
const bool wildcard) {
subscriptions_.emplace(type_url, std::make_unique<SubscriptionStuff>(type_url, local_info_,
use_namespace_matching,
dispatcher_, wildcard));
void NewGrpcMuxImpl::addSubscription(const std::string& type_url,
const bool use_namespace_matching) {
subscriptions_.emplace(type_url, std::make_unique<SubscriptionStuff>(
type_url, local_info_, use_namespace_matching, dispatcher_));
subscription_ordering_.emplace_back(type_url);
}

Expand Down
8 changes: 3 additions & 5 deletions source/common/config/new_grpc_mux_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,9 @@ class NewGrpcMuxImpl

struct SubscriptionStuff {
SubscriptionStuff(const std::string& type_url, const LocalInfo::LocalInfo& local_info,
const bool use_namespace_matching, Event::Dispatcher& dispatcher,
const bool wildcard)
const bool use_namespace_matching, Event::Dispatcher& dispatcher)
: watch_map_(use_namespace_matching),
sub_state_(type_url, watch_map_, local_info, dispatcher, wildcard) {}
sub_state_(type_url, watch_map_, local_info, dispatcher) {}

WatchMap watch_map_;
DeltaSubscriptionState sub_state_;
Expand Down Expand Up @@ -138,8 +137,7 @@ class NewGrpcMuxImpl
const SubscriptionOptions& options);

// Adds a subscription for the type_url to the subscriptions map and order list.
void addSubscription(const std::string& type_url, bool use_namespace_matching,
const bool wildcard);
void addSubscription(const std::string& type_url, bool use_namespace_matching);

void trySendDiscoveryRequests();

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions source/common/config/utility.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
namespace Envoy {
namespace Config {

constexpr absl::string_view Wildcard = "*";

/**
* Constant Api Type Values, used by envoy::config::core::v3::ApiConfigSource.
*/
Expand Down
3 changes: 2 additions & 1 deletion source/common/config/watch_map.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "source/common/common/cleanup.h"
#include "source/common/common/utility.h"
#include "source/common/config/decoded_resource_impl.h"
#include "source/common/config/utility.h"
#include "source/common/config/xds_resource.h"

namespace Envoy {
Expand Down Expand Up @@ -49,7 +50,7 @@ void WatchMap::removeDeferredWatches() {
AddedRemoved
WatchMap::updateWatchInterest(Watch* watch,
const absl::flat_hash_set<std::string>& update_to_these_names) {
if (update_to_these_names.empty()) {
if (update_to_these_names.empty() || update_to_these_names.contains(Wildcard)) {
wildcard_watches_.insert(watch);
} else {
wildcard_watches_.erase(watch);
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading