-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework Sotw Cache interface #7
Rework Sotw Cache interface #7
Conversation
…interface is breaking the separation of concern between the server side managing the xds protocol itself and the cache which is only in charge of returning responses based on watches PR envoyproxy#577 has been opened proposing to fully remove any client state from the CreateWatch interface This commit is updating the Cache interface based on a few concerns/aspects: - one of the argument on removing it is that the sotw protocol is stateless. It actually is not, and is explicitly visible in this PR: the lastResponse latched value as well as the version hack when the resource list is changed very well show that it is stateful - in cases outside of lds and cds, it is clearly stated that the control-plane is not expected to send back all resources when a single one change. This is highly critical in the case of ads with eds, as there is a likelihood of both having a lot of resources and a high churn. The implementation sending everything each time is technically tolerated, but is pushing on the data-plane work that could be done at low cost on the control-plane - the current support of both delta and sotw, for both linear and simple cache means that every change has to be done 4 times, with four very different implementations, each one supporting only a subset of use cases (e.g. linear does not support cds/lds as it doesn't handle wildcard in this case, and sotw does not work properly as resource list is improperly maintained). I strongly believed that both implementation should converge. The only remaining differences should be in the messages themselves, abstracted in the server part. Having a common interface for sotw and delta is a first step for this. It would also allow in the future to make snapshot far more efficient by supporting a linear version per snapshot In this context, this commit is changing: - creates a SubscriptionState within the cache package, driven by the data needed from the cache perspective to be able to answer a client request - simplifies streamState - makes it compatible with the new SubscriptionState interface - removes the IsFirst notion, as the protocol states it should come from the nonce in the request - removes the duplicate/parallel knownResourceNames and use the delta one instead. It will also properly carry the version in a future PR Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Rename KnownResources to ACKedResources to better reflect the change Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
…ches subscribing to multiple resources Properly return the request in sotw responses to allow proper handling in callbacks Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
…near cache Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
[sotw][linear] Fix missing watch cleanup in linear cache for sotw watches subscribing to multiple resources
Co-authored-by: Antoine Tollenaere <atollena@gmail.com> Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
[sotw][issue-540] Return full state when applicable for watches in linear cache
// The versions are: | ||
// - delta protocol: version of the specific resource set in the response | ||
// - sotw protocol: version of the global response when the resource was last ACKed | ||
ReturnedResources() map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not AckedResources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now set the resources when sending the reply. Given the nonce behavior and the lack of support for NACK (handling it without further work would just create an infinite loop), there is no real point in distinguishing them in sotw for now as we explicitly ignore staled requests
pkg/server/sotw/v3/watches.go
Outdated
} | ||
} | ||
|
||
if sub.IsWildcard() && len(req.ResourceNames) == 0 && len(sub.SubscribedResources()) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy from this convo: #6 (comment)
Does that properly handle the case of unsubscribing from explicit wildcard? Probably this is never used, but still I think this violates the protocol spec and should be implemented properly. For example two subsequent requests:
Req(resources: ["*"]) -> explicit wildcard subscription -> IsWildcard() = true
...
Req(resources: []) -> IsWilcard = true, req.ResourceName = 0, GetSubscribedResources() == 0,
This will fall into this if statement, even though it should not be considered wildcard and instead should be considered unsubscribing from wildcard.
There seem to be no way to implement the legacy wildcard behavior without storing an additional boolean in what you called SubscriptionState:
For historical reasons, if the client sends a request for a given resource type but has never explicitly subscribed to any resource names
You can't know that without storing whether you've ever seen a resource name on the stream, can you? Without storing this explicitly, there is a high chance that you end up mixing up legacy/implicit and explicit wildcards.
Member
Author
@valerian-roche valerian-roche 7 hours ago
I agree that we do not support the case when the subscription was explicit wildcard and now requests nothing (not wildcard nor resources). The comment below mentions that we do not support this explicit case currently (in sotw or in delta), but I also don't see much value here. I can add this if preferred though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it seems a matter of adhering to the protocol, so I think this should arguably be supported, even if data planes do not use this yet, it is not great to ignore this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update to handle this case properly
pkg/cache/v3/status.go
Outdated
@@ -101,8 +100,8 @@ type DeltaResponseWatch struct { | |||
// Response is the channel to push the delta responses to | |||
Response chan DeltaResponse | |||
|
|||
// VersionMap for the stream | |||
StreamState stream.StreamState | |||
// State for the type subscription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc format please, e.g.:
// State for the type subscription | |
// Subscription stores the current client subscriptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
// so we set the initial resource versions if we have any. | ||
// We also set the stream as wildcard based on its legacy meaning (no resource name sent in resource_names_subscribe). | ||
// If the state starts with this legacy mode, adding new resources will not unsubscribe from wildcard. | ||
// We also set the subscription as wildcard based on its legacy meaning (no resource name sent in resource_names_subscribe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where you know whether the stream should use legacy behavior or not. Implementing this properly doesn't seem like a big deal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done now within the subscription
pkg/server/sotw/v3/ads.go
Outdated
}, | ||
}) | ||
// Create a buffered channel the size of the known resource types. | ||
respChan := make(chan cache.Response, types.UnknownType) | ||
|
||
process := func(resp cache.Response) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just ditch the process
variable here, if I read correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's now only send. Will cleanup
pkg/server/sotw/v3/ads.go
Outdated
@@ -63,7 +52,7 @@ func (s *server) processADS(sw *streamWrapper, reqCh chan *discovery.DiscoveryRe | |||
case <-s.ctx.Done(): | |||
return nil | |||
// We only watch the multiplexed channel since all values will come through from process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(update this comment if you take my suggestion to remove process
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up, as well as process
if w.nonce != "" && req.GetResponseNonce() != w.nonce { | ||
// The request does not match the stream nonce, ignore it as per | ||
// https://www.envoyproxy.io/docs/envoy/v1.28.0/api-docs/xds_protocol#resource-updates | ||
// Ignore this request and wait for the next one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a link to envoyproxy/envoy#10363 which describes an (unlikely) race that may happen due to this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. To be honest the change mentioned in the PR would require some potentially major changes, especially if we want to keep the strict ordering required by ADS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think just acknowledging the race is OK, I don't think it can realistically trigger in our environment.
} | ||
|
||
w.nonce = out.Nonce | ||
// ToDo(valerian-roche): properly return the resources actually sent to the client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: others use TODO, not ToDo. fwiw I've never seen ToDo written by anyone else that yourself in my whole career, so it almost makes it redundant with your name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to TODO
// Only process if we have an existing watch otherwise go ahead and create. | ||
if err := processAllExcept(typeURL); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite grasp that part. IIUC here you are making sure there are no outstanding responses for other resource type URL? Why is this needed? Reading the comment above processAllExept and here didn't really elucidate. You are also discarding responses of the type currently being processed. Maybe I just need to spend a bit more trying to understand this.
Also, why do we use a variable for processAllExcept, since it's only used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the ordered ADS implementation done in another PR. It handles all the previous one to ensure they are in order and the new one is not answered first. I don't remember the exact reason why the current type is just dropped though. It might be to avoid a potential deadlock in the old model, but not sure
I didn't have time to fully review yet, sorry. Will continue tomorrow. |
a44daa2
to
53fab26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replied to most comments, feel free to do a more thorough review. I will have to split this PR given the scope of changes now, especially with the sub naming alignment touching a lot of pieces
if w.nonce != "" && req.GetResponseNonce() != w.nonce { | ||
// The request does not match the stream nonce, ignore it as per | ||
// https://www.envoyproxy.io/docs/envoy/v1.28.0/api-docs/xds_protocol#resource-updates | ||
// Ignore this request and wait for the next one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. To be honest the change mentioned in the PR would require some potentially major changes, especially if we want to keep the strict ordering required by ADS
pkg/cache/v3/status.go
Outdated
@@ -101,8 +100,8 @@ type DeltaResponseWatch struct { | |||
// Response is the channel to push the delta responses to | |||
Response chan DeltaResponse | |||
|
|||
// VersionMap for the stream | |||
StreamState stream.StreamState | |||
// State for the type subscription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
// so we set the initial resource versions if we have any. | ||
// We also set the stream as wildcard based on its legacy meaning (no resource name sent in resource_names_subscribe). | ||
// If the state starts with this legacy mode, adding new resources will not unsubscribe from wildcard. | ||
// We also set the subscription as wildcard based on its legacy meaning (no resource name sent in resource_names_subscribe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done now within the subscription
pkg/server/sotw/v3/ads.go
Outdated
@@ -63,7 +52,7 @@ func (s *server) processADS(sw *streamWrapper, reqCh chan *discovery.DiscoveryRe | |||
case <-s.ctx.Done(): | |||
return nil | |||
// We only watch the multiplexed channel since all values will come through from process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up, as well as process
} | ||
|
||
w.nonce = out.Nonce | ||
// ToDo(valerian-roche): properly return the resources actually sent to the client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to TODO
…vior Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
53fab26
to
5bbacdc
Compare
5bbacdc
to
cee910d
Compare
dbadfa1
to
79e5e55
Compare
835ef87
to
8d3db09
Compare
Datadog fork PR matching envoyproxy#584
It has been raised that the addition of StreamState within the Cache interface is breaking the separation of concern between the server side managing the xds protocol itself and the cache which is only in charge of returning responses based on watches
PR envoyproxy#577 has been opened proposing to fully remove any client state from the CreateWatch interface
This PR is an alternate proposal based on a few concerns/aspects:
In this context, this PR is changing: