-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glob collections behavior on partial responses #99
Comments
cc @htuch @markdroth Generally speaking, the xDS-protocol is an eventually consistent protocol, which implies that even when sending responses in chunks, eventually the clients will have the same view as the server. Can we take a step back and try to understand what causes the bottleneck? |
Yeah, and that will eventually happen even in this scenario. But if the client has no such "warming"-timeout, it could cause some trouble. Regarding the timeout itself, is it expected that the client always use a "warming"-timeout? That does seem reasonable but it's technically something that could be short-circuited right? Especially now that the Delta/Incremental protocol actually can let the client know that the resource doesn't exist, removing the need for an explicit timeout for the SotW clients. With the Delta protocol there's no need for such a timeout.
It's a little bit of both :) We use the xDS protocol to ship custom types outside of the standard LDS/RDS/CDS/EDS flow. Basically, Similarly, even though it's unlikely that a normal |
Regarding the proposed solution, this is something that we could introduce as an extension, where clients specify that they support this pseudo-EOF notification, and the control plane conditionally sends it back |
Would a list collection work better in this case? I.e. if you want to know the exact expected resources. |
We don't really care about the specific resources. It's functionally equivalent how LEDS works today, where the number of hosts in a specific locality doesn't matter, Envoy just needs to know what they are.
I think it wouldn't really work for this for the same reasons it doesn't work for LEDS. The resources are way too dynamic and would require a lot of round trips to materialize the actual full collection. We can't inline the entries in the collection response since it would also trigger the same problem where it crosses the threshold. Ultimately, there's 2 separate issues I want to discuss:
Maybe we can set up a meeting to discuss this? |
I think some kind of delimiter could make sense. @markdroth thoughts? I think we're generally open to meeting and have a broader interest in collaborating with folks doing work in this space with xdstp and pushing the limits of scalability. |
(Sorry for the delayed response; I was out of town for a few weeks.) In the general case, I don't think it's reasonable to expect the protocol to have a notion of "I have sent you the complete set", because the set of resources in a given collection can change very dynamically (e.g., auto-scaling adding or removing endpoints). So what happens if some resources are added or removed before the control plane finishes sending the initial set? This seems like a somewhat arbitrary decision for the control plane to make in the general case -- and if there's enough churn in the set, then the control plane might never tell the client it has the whole set. Furthermore, it's not clear to me that the client should really care whether it has the whole set. The client already needs to be able to handle resources being added and removed at any time. As @adisuissa mentioned above, xDS is an enventually consistent protocol, and it should not matter to the client whether it gets all of the endpoints at once or whether they are split into two responses. Note that at least in gRPC, there is always a bit of endpoint imbalance when the client first starts up, because even if it gets all of the endpoint addresses at once, the connection attempts for those addresses will finish at different times, and the client will start sending traffic to the ones it is connected to while waiting for the others. This imbalance smooths itself out fairly quickly, assuming all endpoints are reachable. But a small delay in getting all of the endpoint addresses should not in principle make much difference. (I realize that Envoy works differently than gRPC in that regard: in Envoy, the LB policy picks a host without regard to whether that host currently has a working connection, so this initial imbalance might not happen if all of the endpoints are known. But I will point out that the flip side of that is that Envoy may choose a host that turns out to be unreachable, thus causing the request to fail, whereas gRPC will not do that.) So I think the main question I have here is, why isn't eventual consistency good enough here? Shouldn't the short-term imbalance be very quickly resolved? Are there other things you can do to alleviate even that short-term problem, such as having the control plane randomize the order of the endpoints it hands out? |
If a given delta response contains too many resources, the server will break it up into multiple responses. However, this means the client does not know whether it received all the resources for its subscription. This is especially relevant for wildcard subscriptions, for which the client does not know the resources ahead of time and therefore cannot wait for them explicitly. By returning additional metadata in the nonce (there is no field for this in the delta discovery response, though I'm hoping that will change cncf/xds#99), the client can know if the server chunked the response, and react accordingly.
If a given delta response contains too many resources, the server will break it up into multiple responses. However, this means the client does not know whether it received all the resources for its subscription. This is especially relevant for wildcard subscriptions, for which the client does not know the resources ahead of time and therefore cannot wait for them explicitly. By returning additional metadata in the nonce (there is no field for this in the delta discovery response, though I'm hoping that will change cncf/xds#99), the client can know if the server chunked the response, and react accordingly.
I have a question on how clients are supposed to interpret glob collection responses from an xDS control plane. gRPC has a default message limit of 4MB, which can cause clients to reject a response from the control plane if it is too large. In practice, most glob collections will be small enough to fit in a single response, however, at LinkedIn, some clusters teeter over the edge of this limit during high load, which was causing some clients to simply reject the response. This is especially likely during startup since the clients may request multiple collections at once which can easily cross this size threshold. Because the limit is not trivial to raise (and there is no guarantee a single value will fit all usecases), our control plane implementation instead splits the response into multiple "chunks", each representing a subset of the collection, such that each response is smaller than 4MB. However, this raises the question of how the client should behave under such circumstances.
The spec does not dictate that the collection be sent as a whole every time (nor should it, for the reason listed above), but it also provides no way to mark the "end" of a collection or a means to provide the collection's size. This means in some extreme cases the client may receive only a very small subset of the collection on the initial response from the control plane. In this scenario, should the client:
There is no room in the protocol today to really communicate the size of the collection, and arguably it's something that would provide little to no purpose other than for this specific edge case. My suggestion would be to mimic the glob collection deletion notification, but in reverse. Here is what it would look like (following the example in TP1):
xdstp://some-authority/envoy.config.listener.v3.Listener/foo/*
.[xdstp://some-authority/envoy.config.listener.v3.Listener/foo/bar, xdstp://some-authority/envoy.config.listener.v3.Listener/foo/baz, xdstp://some-authority/envoy.config.listener.v3.Listener/foo/*]
.By adding the glob collection's name in the response, the control plane can signal to the client that it has sent everything. This serves to effectively bookend the response from the control plane. The client can subsequently wait for this "end-of-glob-collection" notification to unambiguously determine whether it has received every resource in the collection. The resource named after the collection would have to be null or some special value to prevent it from being interpreted as an actual member of the collection. This proposition could require some changes on clients, but this problem seems important to address as more systems leverage the xDS protocol.
The text was updated successfully, but these errors were encountered: