-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#583][Sotw] Ensure resources are properly sent again if envoy unsubscribes then subscribes again to a resource #585
base: main
Are you sure you want to change the base?
Conversation
…e uniform between sotw and delta Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
…voy unsubscribes then subscribes again to a resource Fix potential deadlock in sotw-ads related to improper cleanup of watches in Linear cache when using delta in non-wildcard Fix improper request set on sotw responses in Linear cache Replaced lastResponse in sotw server by staged resources pending ACK Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
…nonce and version within streamState to properly track it with the pending resources Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
response: responder, | ||
}) | ||
// If we had an open watch, close it to make sure we don't end up sending a cache response while we update the state of the request | ||
w.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race mentioned here is currently present in the code if we receive a new request while we are building a response in the cache (race on KnownResourceNames)
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
@valerian-roche Is this one still needed in upstream or should it remain stale/closed? |
Those PRs are addressing issues which are still present. There has been no feedback on it or #584 which is a pre-requisite of those PRs |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
// ClientState provides additional data on the client knowledge for the type matching the request | ||
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources) | ||
// Though the methods may return mutable parts of the state for performance reasons, | ||
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment would benefit from emphasizing that this state is per resource type.
// ClientState provides additional data on the client knowledge for the type matching the request | |
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources) | |
// Though the methods may return mutable parts of the state for performance reasons, | |
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation | |
// ClientState stores the server view of the client state for a given resource type. | |
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources). | |
// Though the methods may return mutable parts of the state for performance reasons, | |
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated on parent PR #584, not yet rebased on this branch
// The versions are: | ||
// - delta protocol: version of the specific resource set in the response | ||
// - sotw protocol: version of the global response when the resource was last ACKed | ||
GetKnownResources() map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't GetAckedResources
be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to this name. I have not pushed this branch yet, but I updated #584
// Non wildcard case, we only reply resources that have effectively changed since the version set in the request | ||
// This is used for instance in EDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you must special case CDS and LDS to not fall into this case. Is this an existing bug?
From https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#four-variants:
The SotW approach was the original mechanism used by xDS, in which the client must specify all resource names it is interested in with each request, and for LDS and CDS resources, the server must return all resources that the client has subscribed to in each request. This means that if the client is already subscribing to 99 resources and wants to add an additional one, it must send a request with all 100 resource names, rather than just the one new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a known limitation that linear does not set full resources currently, and simple send all resources even when it should only send the changed ones (e.g. endpoints)
My hope was to finish the merge of the protocols first to not have to do code multiple times (now even more with sotw ads implementation) to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#540 mentions this issue in the case of Sotw-ads on linear cache for xds-grpc
state = stream.NewStreamState(len(req.ResourceNames) == 0, nil) | ||
} else if nonce != state.LastResponseNonce() { | ||
// The request does not match the last response we sent. | ||
// The protocol is a bit unclear in this case, but currently we discard such request and wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol is very clear on this:
From https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#resource-updates:
The management server should not send a DiscoveryResponse for any DiscoveryRequest that has a stale nonce. A nonce becomes stale following a newer nonce being presented to Envoy in a DiscoveryResponse.
This prevents the management server from sending a response to a request that had not taken the last sent response into consideration yet.
(By the way, comment on line 149 is quite misleading -- I don't think they can be reused across streams, not even across type for ADS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed we need to rework/rephrase this. In sotw the nonce behavior is critical and improperly implemented in the control-plane currently. In delta we basically fully ignore them currently also, though it's only used for ACK/NACK there.
// The request does not match the last response we sent. | ||
// The protocol is a bit unclear in this case, but currently we discard such request and wait | ||
// for the client to acknowledge the previous response. | ||
// This is unclear how this handles cases where a response would be missed as any subsequent request will be discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a response be missed by a client? Responses have order guarantees. Can this really happen? Does Envoy skip received versions when it sees that it has multiple responses ready to be processed? This seems like an awkward behavior, as it's quite difficult to even implement that with normal grpc clients...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current control-plane implementation tracks the nonce per type, even though the nonce is per stream. It's therefore likely to have misaligned values here. I need to do a full review of the version/nonce situation as I'm pretty sure it's really wrong in ads cases. In non-ads cases the issue does not appear as type watch and stream are 1:1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nonce is actually per type iiuc? That could be documented more clearly... The nonce exists to prevent a race involving individual resource types. I think making the nonce per ADS stream (and cross resource type) would not work well: requests would be considered stale based on other resource type responses, and the client would have to re-request.
There is actually one edge case to at least take into consideration that may challenge some of the comments I made in my previous review, explained here: envoyproxy/envoy#10363.
state, ok := streamStates[typeURL] | ||
if !ok { | ||
// We don't have a current state for this type, create one | ||
state = stream.NewStreamState(len(req.ResourceNames) == 0, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about explicit wildcards? ("*")? Shouldn't wildcard be true in that case, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit wildcards are handled in the subscription tracking
|
||
// Check if the new request ACKs the previous response | ||
// This is defined as nonce and versions are matching, as well as no error present | ||
if state.LastResponseNonce() == nonce && state.LastResponseVersion() == req.VersionInfo && req.ErrorDetail == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check the nonce here?
- If this is an existing stream/type pair, then nonce not matching skips all the logic via continue at L180.
- If this is a new stream, then both request and response nonces are empty so they always match.
I'm asking because if versions were not just an integer generated per stream and per instance (e.g. if based on a hash of subscribed resources), I think would be OK to consider the client up to date if it has the right version but an empty nonce (first request), as long as we can guarantee that the set of subscribed resources match (this implies embedded subscribed resource names into the version). Anyways, this is not how linear cache works today, so perhaps not so important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the nonce behavior: as mentioned above, we need to fully review it
For the shortcut of subscribed resources, we can check but envoy requires to send a response even if the version does not change for some phases of load. If we have the latest version and what was subscribed then, if envoy in-between the streams unsubscribed and is now resubscribing we would be missing a response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so the list of subscribed revisions would need to be part of the version string.
Envoy requires to send a response even if the version does not change for some phases of load
OK, so I guess my hope that we could avoid sending a response on stream open if it is already up to date is not something we can support, even in theory. Anyways, this is not the focus of this PR so I'll close this comment for now.
} | ||
|
||
// Remove resources no longer subscribed from the stream state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Remove resources no longer subscribed from the stream state | |
// Remove resources no longer subscribed to from the stream state. |
knownResources := state.GetKnownResources() | ||
unsubscribedResources := getUnsubscribedResources(req.ResourceNames, knownResources) | ||
for _, resourceName := range unsubscribedResources { | ||
delete(knownResources, resourceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I find this pattern of using a getter and then mutating the streamState object directly from the field quite unintuivite.
Perhaps it would make sense to simply expose the fields from stream state directly? Or expose a "setter".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like this pattern. Setter is a bit messy here, and map attributes create risks of users inserting in nil maps. I'll check how we can improve this
// The protocol clearly states that if we send another response prior to an ACK | ||
// the previous one is to be considered as discarded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the protocol section that states that? I don't find the wording and I'm not sure this concern is valid.
Not having to resend resources that were already sent would make the code much simpler, as you would not need to track aknowledgement, afaik (no concept of "pending" resource).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a mistake on my side for the comment
With regards to tracking, if envoy NACKed a resource and we consider it ACKed it might not be sent again. I'm not fully sure if:
- we send a response at version N
- we send another response at version N+1, prior to envoy acknowledging version N
- if we considered that envoy already knew all versions at N, and don't resend them in N+1, and client subscriptions changed during this window, I'm afraid we could have cases where resources are not re-added when they should be. I'd need more time to assess it fully though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, could you validate envoy's behavior in the case you describe? gRPC will cache versions sent at version N, and only update those that changed at N+1, so nothing is lost. I think it'd be tricky to even implement something different on the client side, but perhaps Envoy does.
Thanks for working on this, this is great. This would fix #431 which makes go-control-plane pretty much incompatible with grpc clients, as those typically unsubscribe to resources after a channel reaches its idle timeout. |
@atollena the patch here fixes the go client idle issue for me. Hopefully that it can help you too |
This PR is part of the fix for the issue related to resources being unsubscribed from then subscribed to again
This one applies to both simple and linear cache when using the sotw protocol.
Actionable cases would likely only impact sotw-ADS as EDS outside of ADS does not share streams and will therefore usually not have multiple resources this way
This PR is built on top of #584 as it's relying on the client state being provided to the cache
It is also fixing a few bugs found during testing:
It is updating a few other aspects in order to implement the fix:
Finally, reasoning on the fixing through proper tracking of resources rather than basic resend of everything on doubts is explained within #584, and is even more applicable in the context of this issue (EDS through sotw-ads) when a single client can track thousands of endpoints within hundreds of CLAs