Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Add UDPA resource names to SotW API, RouteAction, and ClusterStats #13555

Closed
wants to merge 3 commits into from

Conversation

markdroth
Copy link
Contributor

Signed-off-by: Mark D. Roth roth@google.com

Commit Message: api: Add UDPA resource names to SotW API, RouteAction, and ClusterStats
Additional Description: Adds the new UdpaResourceName and UdpaResourceLocator fields to various additional places where it's needed in the API.
Risk Level: Low
Testing: N/A
Docs Changes: Included in PR
Release Notes: N/A

Use of UdpaResourceLocator in the SotW API will require using the client feature to tell the server to wrap the returned resources in the Resource message, as discussed in #13201.

The changes in RouteAction and ClusterStats are necessary to allow us to refer to a cluster that has a new-style name.

CC @htuch

Signed-off-by: Mark D. Roth <roth@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13555 was opened by markdroth.

see: more, trace.

@htuch htuch self-assigned this Oct 14, 2020
@@ -136,12 +138,24 @@ message ClusterStats {
}

// The name of the cluster.
string cluster_name = 1 [(validate.rules).string = {min_len: 1}];
string cluster_name = 1 [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've been thinking about is whether we are better off just using flattened string UDPA URLs everywhere other than DiscoveryRequest/Response. Within Envoy, I anticipate that we will flatten to string these URLs rather than maintain a structured representation, since this makes it easy to interop with existing opaque string names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two independent places we can make this choice:

  1. In the client implementation.
  2. In the API.

The two choices are largely orthogonal, because regardless of what choice we make for either of them, we can easily convert to the other when we send or receive messages.

The decision for (1) does not have to be universal -- Envoy and gRPC (and other clients in the future) could all make different choices for their internal representations. For gRPC C-core, I am thinking that we will probably use a struct representation, because there are a lot of places where we're going to want to access just one of the fields from the URI (e.g., just the authority), and we don't want to have to re-parse the URI each time we need to do that. But I might change my mind about that as I see how the code winds up looking.

In any case, though, this PR is only about the API. I think we should be consistent about the representation across the entire API. If we're using the UDPA protos in DiscoveryRequest/Response, then I think we should also use them everywhere else in the API as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy the consistency argument, but I'd also argue there is a distinction between what we opt for in the transport protocol and what we use in the resources themselves. There's probably other places we might have missed, e.g. what about WeightedCluster or every place we reference SDS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've added the needed field in WeightedCluster. I also added one for VHDS.

It looks like you already took care of SDS in #11810 (see here), at the same time that you did RDS and EDS.

Do we also need something for SRDS? I don't know enough about how that works to know what new fields might needed or where they should go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try grep or search for string cluster (which is just one set of these names). There's actually quite a lot, ranging from places such as cluster_header (do we force our naming scheme on that?), http_uri.proto (basically, anytime we try and do HTTP in Envoy we need to know a cluster name, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a UdpaResourceLocator field everywhere that I see "string cluster", with only one exception. That exception is in config_source.proto, where the cluster name says that it's intended to be used only with REST and only when the specified cluster is fully static (no CDS or EDS). I don't think we'll wind up using the new names with those constraints, so I didn't add the field there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to do this for secret name? After going through this exercise, are you still preferring this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "secret name" field are you referring to? I think you already took care of SDS, as I mentioned above, so I'm not sure what else needs to change here.

With regard to the broader question, I can see arguments both for doing this and for just using simple string representation. But as I said earlier, I think we should be consistent in how we do this across the entire API. So I think we have two options:

  1. Use string form everywhere. We would completely remove the UdpaResourceName and UdpaResourceLocator protos, since they would never be used. We can simply store URIs in string form in the existing string fields for resource names, both in {Delta,}Discovery{Request,Response} and in the resource protos themselves.

  2. Go forward with this PR so that we're consistently using UdpaResourceName and UdpaResourceLocator everywhere.

Maybe we should sit down and game through all of the implications of both options. For example, in option 1, we'd basically be giving up proto validation, and we'd be requiring all xDS clients and servers to parse and validate the URI strings themselves. We should also talk through which option will be easier in terms of migration. Are there going to be problems if both old and new names are encoded in the same string field? Or can we seamlessly define a bidirectional lossless conversion between old and new names (e.g., the old name just gets moved into the id field somehow)?

Let's talk this through when you get back.

Signed-off-by: Mark D. Roth <roth@google.com>
…ted.

Signed-off-by: Mark D. Roth <roth@google.com>
@@ -46,6 +48,9 @@ message HttpUri {
// cluster: jwks_cluster
//
string cluster = 2 [(validate.rules).string = {min_len: 1}];

// [#not-implemented-hide:]
udpa.core.v1.ResourceLocator cluster_resource_locator = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be URI rather than URL if we're referring to a concrete entity like a cluster that exists in the data model, vs. describing how to fetch something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's actually a distinction between a resource that already exists in the data model and a case where we will need to fetch something. We're moving more and more to a world of on-demand resource fetching, which means that in the general case, any time we refer to a resource, we don't have any way of knowing whether we've already fetched the resource or whether referring to it will cause us to go fetch it.

I realize that today, Envoy always fetches all clusters up front. However, that's not true of gRPC, and it will likely not be true of Envoy in the future, when we eventually add on-demand cluster fetching. (This is where I make my periodic case for something like #8200. :) ).

To say this another way, I think that the difference between ResourceLocator and ResourceName is that the former is search criteria and the latter is a name for one specific resource. This difference is only slightly relevant today, especially when we know that the query is for a single resource instead of a collection, but if/when we add more flexible matching for context params, I think it will become more important.

@htuch
Copy link
Member

htuch commented Nov 13, 2020

I'm still a bit confused philosophically about naming here. We have various names across the APIs that are used to point at other objects (URL) and also name the object itself (URI). If I'm following your logic correctly, you're saying that a route lookup could resolve to a URL which could potentially match multiple URIs (e.g. if we have flexible context parameter matchers in the URL).

@markdroth
Copy link
Contributor Author

markdroth commented Nov 13, 2020

First, a terminology nit: When referring to a name, I think we should use the term URN, not URI. URNs and URLs are both forms of URIs.

But back to your question, I think my reference to collections was actually misplaced here; I think that's actually orthogonal to the question of flexible matching. I think that even if/when we add more flexible matching rules for context params, it will still need to be clear from a given URL whether it is referring to an individual resource or a collection of resources. (If it refers to a collection resource type or the id ends in a /*, then it points to a collection; otherwise, it points to a single resource.)

I think the issue here is just about whether a URL with a particular set of context params can match a URN with a slightly different set of context params. For example, imagine if we allowed a URL to contain a regexp for a context param value instead of an exact-match string. If it's not requesting a collection, then the URL must still match exactly one URN, but the syntax of the URL and the syntax of the URN may not be exactly the same. That's why I think all cases of referring to a resource (as opposed to providing a resource) should be URLs, not URNs.

@htuch
Copy link
Member

htuch commented Nov 13, 2020

Sticking with the route resolving to cluster example. If we have a URL with flexible context parameter matching, it might actually plausibly resolving to > 1 distinct cluster URN. How do we disambiguate?

@markdroth
Copy link
Contributor Author

I think that's something we'd have to define as part of any proposal to add flexible matching. I don't think we need to fully define that now, but one approach we could take is to define some rules that dicate how "close" a given URN is to matching the URL, and we could say that the closest match wins.

Regardless of how the matching works, though, my main point here is that the reason we added URLs to the design in the first place was to have a different type for referring to resources vs. when naming a resource, so that we'd have flexibility to figure out this kind of thing. The cases you're referring to are exactly the intended purpose of URLs. If you are arguing against using them here, then I think you're actually arguing against having URLs in the design at all. I'm really hoping we don't need to reopen that debate at this late stage. :)

@htuch
Copy link
Member

htuch commented Nov 13, 2020

I don't think I'm arguing against using them in the design, I'm just trying to grok what the concrete implications are, given we already have these referential relationships in Envoy today where there is a 1:1 match. We're moving from a model of very easy to reason about opaque strings with exact match to match/resolution of URL/URN when references cross resources. We're basically removing a proxy-local shortcut for referencing resources and going from 1:1 to 1:N. This makes the data model more complicated for users, but is fine for machines IMHO.

I'm OK with this, but it's a major design change in xDS. We should verify that other @envoyproxy/api-shepherds are good with this, given the impact.

@markdroth
Copy link
Contributor Author

I'm certainly happy to hear input from the other API shepherds, but I have to say that I really don't understand the concern with this particular PR.

I don't believe that this PR is actually introducing any 1:N mapping, because UdpaResourceLocator does not actually imply or allow anything like that; it is purely 1:1. I understand the concern that it would be bad if we accidentally introduced 1:N mapping as part of some future design for flexible matching (and I do agree that we should do not do that!), but I think that's a concern to discuss if/when we have such a proposal, not something related to this PR.

I also don't think this PR is any sort of "major design change in xDS". Note that you already previously added UdpaResourceLocator in a number of places in #11810, so if this is some sort of problem, the problem already exists. All that's happening in this PR is to add it to places that weren't handled in the earlier PR.

I think our positioning here is exactly the same as it was when we agreed on the new naming scheme design: We're using different types for URLs and URNs, because URLs already support a couple of things that URNs don't (glob matches and directives), and because we wanted to allow for the possibility of adding other things in the future (such as flexible matching for context params). Keeping this possibility open doesn't mean that we're commiting to do it; it just means that we have options as we move forward.

Comment on lines +168 to +169
udpa.core.v1.ResourceLocator cluster_resource_locator = 12
[(udpa.annotations.field_migrate).oneof_promotion = "cluster_specifier"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been tracking the back and forth, but I skimmed this PR and let my try to summarize what I think @htuch concerns are and some of my own thoughts:

  1. I think @htuch thinks this ResourceLocator can potentially point to multiple targets, which is very different than what we have today. @htuch is that right? If that is right, is it up to the impl to fail to load if that is the case in a location where that doesn't make sense.
  2. There is some discussion about a flat string vs. a structured impl, right? Is the summary there that we want the structured impl in the API even though we might store a flat string internally?

My thoughts:

  1. I'm concerned it's going to be very painful to keep track of all cases of this. Even though it would also be painful, should we deprecate existing string clusters and replace with only ResourceLocator or a message that wraps a oneof between a flat string and a ResourecLocator?
  2. Are we sure that every change in this PR makes sense? Concretely, does the changing the Node field (this field) actually make sense? This is an opaque string, it's not an xDS fetch. Are there are other cases like this that might need discussion?

@mattklein123 mattklein123 self-assigned this Nov 14, 2020
@htuch
Copy link
Member

htuch commented Nov 14, 2020

I think @htuch thinks this ResourceLocator can potentially point to multiple targets, which is very different than what we have today. @htuch is that right? If that is right, is it up to the impl to fail to load if that is the case in a location where that doesn't make sense.

Yes, this is what it boils down to. I'm alright with either defining the resolutions as most specific or failing if there are multiple matchers, I just want to make sure we're comfortable with this as a conscious decision. To Mark's point on this already being a feature of the new naming scheme, the existing anchor points for URLs are locations where we were explicitly performing a config source resolution and lookup, whereas the points in this PR are "internal" (e.g. a cluster action for a route), they assumed the existence of the resource effectively. Given how things are done with gRPC (where there are no internal references) and a move towards viewing the API as a true DAG with edges being URLs, I think these "internal" references are really a vestigial artifact and we should think of any reference to resources in terms of URL/URN.

In summary, I'm good with URL, wanted to make sure everyone is on the same page.

There is some discussion about a flat string vs. a structured impl, right? Is the summary there that we want the structured impl in the API even though we might store a flat string internally?

I think there is a strong argument for structured in DiscoveryRequest/Response, since it will facilitate efficient representation of things like context parameters. Mark and I chatted about this and a scheme like HPACK dynamic table might eventually make sense when context parameters become a major contribution to config size, so having some structure makes sense. The argument could also be made that structure then makes sense everywhere in the API.

@markdroth
Copy link
Contributor Author

Thanks for the discussion. If I'm understanding correctly, the concern here is about the fact that Envoy deals with clusters as "internal" references and not real resource fetches, so the semantics could be different. And since this is a different code-path, the semantics of that lookup could accidentally provide 1:N semantics instead of the intended 1:1 semantics. Is that right?

I think the way I would model this would be to treat these as real references to resources that just happen to be pre-fetched by Envoy. So the same matching rules (which do require 1:1 match) apply as they would to any other resource; it's just that Envoy is applying those matching rules against its pre-fetched list of resources instead of using the xDS client code (which may or may not already have the resource cached).

To say that another way, the fact that Envoy is pre-fetching the cluster resources can be considered an implementation detail; the matching rules should work exactly as if it was not implemented that way. I think that property will be necessary in the long run anyway, since I think we'll need to support on-demand CDS at some point.

Also, responding to something @mattklein123 asked:

I'm concerned it's going to be very painful to keep track of all cases of this. Even though it would also be painful, should we deprecate existing string clusters and replace with only ResourceLocator or a message that wraps a oneof between a flat string and a ResourecLocator?

I do think that it would help to have a more concrete plan for how we are going to migrate from old-style to new-style names. For example, is it Envoy's intent to move completely to new-style names, deprecating the old-style names, or will it always accept both?

Also, for whatever interval the old- and new-style names co-exist, how exactly does that work? I can think of two possible approaches:

  1. We define a bi-directional conversion between old- and new-style names (e.g., representing the old name as the "id" part of the new-style name, percent-encoding characters like ?, and using an empty authority). Internally, we store everything as a new-style name. When a client talks to a given server, it will decide which form to send based on whether the server supports new-style names (which will have to be configured somewhere, maybe in the ConfigSource). Servers that are handling this migration will expect to serve the same resource for a given name, regardless of whether the name is expressed as an old-style or new-style name.

  2. We say that old- and new-style names basically exist in different namespaces, and there is no conversion between them. This means that whenever we store a resource name internally, we need to store an explicit indication of whether it's an old-style or new-style name, so that we know how to encode it on the wire. (Internally, we can continue to store names as simple strings if we want. For example, we could basically automatically prefix all old-style names with "xds:", and new-style names will start with "udpa:", so we can tell which are which. Note that the "xds:" prefix would not be visible to the API; it would just be an internal storage thing so that the code can tell how to handle it.)

I think option 2 makes more sense, because it does not require separtely configuring whether a given server supports new-style names; we can simply ask for whatever name we are told to ask for, whether it's old- or new-style, and the right thing will happen. (The right thing will not happen if someone configures us to ask for a new-style name from a server that does not understand new-style names, but that's true regardless of how we do this.)

Anyway, a lot of this is not directly related to this PR, but I do think it would be useful to identify a more concrete migration strategy in order to answer some of these questions.

@mattklein123
Copy link
Member

I will defer to @htuch on the details of internal implementation in Envoy, but I do think we are getting to the crux of the issue. My concrete preference is:

  1. Make sure we are not going to cause issues in terms of Envoy implementation (I will defer to @htuch)
  2. I think we should deprecate string clusters and replace them with a new message which has a oneof which contains either a string or resource locator (your (2) above)
  3. Double check that all things in this PR should actually be replaced. Concretely, I don't think Node should change to a resource locator. The cluster string in Node is just a string, it's not a CDS resource.

@markdroth
Copy link
Contributor Author

I think we should deprecate string clusters and replace them with a new message which has a oneof which contains either a string or resource locator (your (2) above)

If we are actually going with (2), then I think that affects all resource types, not just clusters. So does that imply that we want to add a message that contains a oneof for any resource name, and use that everywhere? That would presumably also require changing all of the places where the new field was added in #11810.

Also, I think it may be worth first answering the question about whether we intend to eventually deprecate the old-style names or whether we intend both styles to continue to work indefinitely. If we do intend to eventually deprecate the old-style names, then creating a new message containing a oneof and migrating everything to that seems like it will leave us in a bit of an ugly state, since we'd eventually be left with the new field in a sub-message in a oneof for no real benefit.

Concretely, I don't think Node should change to a resource locator. The cluster string in Node is just a string, it's not a CDS resource.

What is it actually used for? If it's a cluster name and we move to a world where all clusters use new-style names, does it do any good at all for this to just be a string?

I'm happy to make that change; I just want to make sure that there are no implications we're missing here.

@markdroth
Copy link
Contributor Author

Recording the discussion from today's meeting for posterity:

  • Resource names will continue to be string fields instead of structs in the API. For new-style names, the string will start with "xdstp:", in which case we can do whatever additional parsing we want to.
  • The string will be used as the cache key. This means that there is no requirement that an old style name "foo" and a new-style name "xdstp:foo" are actually the same resource; they will each be cached independently. Management servers can choose whether or not to return the same contents for both of those resource names. (If a cache implementation uses the structured form for cache keys, then it can include a boolean in the struct to indicate whether it was an old- or new-style name.)

Given all of that, this PR is no longer necessary, so I'll go ahead and close it.

@htuch, I assume you'll remove the struct fields that have already been added in a separate PR.

Thanks!

@markdroth markdroth closed this Nov 30, 2020
@markdroth markdroth deleted the udpa_resource_name branch November 30, 2020 21:44
htuch added a commit to htuch/envoy that referenced this pull request Dec 2, 2020
As per discussion summarized in
envoyproxy#13555 (comment), we will not use structured
xdstp:// names/locators in the API initially. Instead, we will re-use existing string fields for
names and special case any name with a xdstp: prefix. We leave open the option of introducing
structured representation, in particular for efficiency wins, at a later point.

Risk level: Low (not in use yet)
Testing: CI

Signed-off-by: Harvey Tuch <htuch@google.com>
htuch added a commit that referenced this pull request Dec 3, 2020
As per discussion summarized in
#13555 (comment), we will not use structured
xdstp:// names/locators in the API initially. Instead, we will re-use existing string fields for
names and special case any name with a xdstp: prefix. We leave open the option of introducing
structured representation, in particular for efficiency wins, at a later point.

Risk level: Low (not in use yet)
Testing: CI

Signed-off-by: Harvey Tuch <htuch@google.com>
@stevenzzzz
Copy link
Contributor

/cc stevenzzzz

@htuch
Copy link
Member

htuch commented Dec 9, 2020

After some discussion @stevenzzzz, I'm inclined to add to the proposal above the following:

  • Allow resources in the data model to have empty names (when being sent from server to client).
  • If a client receives a resource with an empty name, we use the resource name from the transport when it was delivered on the client-side to populate the name (at least conceptually). In things like config dump, we will include the full xdstp:// name. Internally, in the proxy, we may use a more efficient representation.

This avoid repeating the xdstp:// string twice for every resource, allows more efficient wire optimization, saves client memory and simplifies some control plane proxy interactions.

@markdroth @mattklein123 WDYT?

@markdroth
Copy link
Contributor Author

From our offline discussion, it sounds like you mean that we would omit the name field that exists inside the data model protos (e.g., Listener.name, RouteConfiguration.name, Cluster.name, and the confusingly-named ClusterLoadAssignment.cluster_name), since the resource name would be sent in the Resource wrapper proto instead.

This sounds fine to me. I actually think we should ultimately try to move away from the name fields inside of the individual data model protos -- they should ideally be empty whenever the Resource wrapper proto is used.

@mattklein123
Copy link
Member

This sounds fine to me. I actually think we should ultimately try to move away from the name fields inside of the individual data model protos -- they should ideally be empty whenever the Resource wrapper proto is used.

I think this is my question also after reading this. If we can set the name to empty and everything works, why do we have the name at all. Is that a historical artifact that we can get rid of?

@htuch
Copy link
Member

htuch commented Dec 16, 2020

I think for config dump reasons etc. there was a push earlier in the API history towards having all resources being "self-describing". This meant that we had name fields. I think it's awkward and would best be replaced by a wrapper envelope like Resource. I think we'd want to be more prescriptive on using alt_stats_name or the moral equivalent.

@mattklein123
Copy link
Member

I think for config dump reasons etc. there was a push earlier in the API history towards having all resources being "self-describing". This meant that we had name fields. I think it's awkward and would best be replaced by a wrapper envelope like Resource. I think we'd want to be more prescriptive on using alt_stats_name or the moral equivalent.

Yeah agreed. Either way we should carefully document all of this so it's clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants