-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework Cache interface #584
base: main
Are you sure you want to change the base?
Conversation
20b18fd
to
a36df79
Compare
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Hey @valerian-roche can you resolve those merge conflicts so I can start taking a look at this? |
af7a06d
to
3122baf
Compare
Hey @alecholmez |
f4b5e9d
to
1a049e9
Compare
This PR will need to be rebased on |
1a049e9
to
03ce554
Compare
I rebased this PR on the latest changes (especially the rollback of #615 which was blocking it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of drive-by notes.
I'd also suggest adding more commentary to the commit message to describe the overall change, why the change is needed and the effects of the change on the system (particularly for the rework commit). The reader kinda has to try to stitch that together there from the PR description, and a bit of hand-holing would be helpful :)
pkg/cache/v3/cache.go
Outdated
GetSubscribedResources() map[string]struct{} | ||
|
||
// IsWildcard returns whether the client has a wildcard watch. | ||
// This considers subtilities related to the current migration of wildcard definition within the protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This considers subtilities related to the current migration of wildcard definition within the protocol. | |
// This considers subtleties related to the current migration of wildcard definitions within the protocol. |
It would also be useful to direct the reader to a description of these subtleties so they can understand what issues they might encounter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add links for those
pkg/cache/v3/cache.go
Outdated
@@ -50,7 +70,7 @@ type ConfigWatcher interface { | |||
// | |||
// Cancel is an optional function to release resources in the producer. If | |||
// provided, the consumer may call this function multiple times. | |||
CreateWatch(*Request, stream.StreamState, chan Response) (cancel func()) | |||
CreateWatch(*Request, ClientState, chan Response) (cancel func()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are breaking compatibility here, we should return (cancel func(), err error)
(here, and in other API that creates a watch) so that failures to create a watch can be reported correctly. See discussion in #689.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor thoughts. IMO we should really linger on what we want for this interface here. I like the idea, this is going to be a big part of the state machine between the client/server.
pkg/cache/v3/cache.go
Outdated
// Though the methods may return mutable parts of the state for performance reasons, | ||
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation | ||
type ClientState interface { | ||
// GetKnownResources returns the list of resources the clients has ACKed and their associated version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// GetKnownResources returns the list of resources the clients has ACKed and their associated version. | |
// GetKnownResources returns a list of resources that clients have ACK'd and their associated version. |
pkg/cache/v3/cache.go
Outdated
// The versions are: | ||
// - delta protocol: version of the specific resource set in the response | ||
// - sotw protocol: version of the global response when the resource was last ACKed | ||
GetKnownResources() map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm beginning to think that using map[string]string
everywhere is asking for trouble. How do you feel about defining a data structure that holds some of this info? We could guard it and make it thread safe too. If we're controlling this interface it might be nice to provide some guard rails for consuming users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree. This PR attempted to rework the interface with the smallest possible impact elsewhere, but as we are breaking backward compatibility we might want to go all-in and make it better to fit the model we're attempting to achieve
pkg/cache/v3/cache.go
Outdated
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources) | ||
// Though the methods may return mutable parts of the state for performance reasons, | ||
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation | ||
type ClientState interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we just called this State
? Or maybe something more abstract? To me it this has always been a data structure that bridges the clients and servers and their state management. My original thought behind StreamState was that it was ambiguous enough to be the middle ground between the client/server but contextual enough to know it's on an individual gRPC stream.
If we want to call it ClientState I'm cool with that, just a thought though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on the naming here. This state represents the state of a given stream for a given object type, so StreamState
(though it might be overloaded) might be another possibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to SubscriptionState
, but I'm absolutely open to rename it
pkg/cache/v3/cache.go
Outdated
// GetSubscribedResources returns the list of resources currently subscribed to by the client for the type. | ||
// For delta it keeps track of subscription updates across requests | ||
// For sotw it is a normalized view of the last request resources | ||
GetSubscribedResources() map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I remember why this is needed as a map. Wondering if we that's actually necessary in the cache... I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here. A main aspect this PR fixes is the previous reuse of KnownResources as SubscribedResources. This creates multiple edge cases and even more issues in the context of the new wildcard model.
I am even thinking that we may want to go to a more advanced model distinguishing NamedResources
and WildcardResources
in the context of the new wildcard model where odcds can subscribe to wildcard
and explicit
resources. We currently don't allow Cache writers to take decisions based on this, either in the server API (apart from intercepting requests) or the cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split between IsWildcard
and GetSubscribedResources
should now allow to remove the old assumption of "no resources" == "wildcard" which is no longer true. Also a request with IsWildcard
set to true and explicit resources in GetSubscribedResources
carries the meaning of "client requested wildcard and explicit resources", and a cache implementation can act on this
adc9151
to
57978c5
Compare
Rebased the PR on the ads sotw changes. Will address further comments and update the PR |
66dce37
to
db72c18
Compare
db72c18
to
5792964
Compare
…interface is breaking the separation of concern between the server side managing the xds protocol itself and the cache which is only in charge of returning responses based on watches PR envoyproxy#577 has been opened proposing to fully remove any client state from the CreateWatch interface This commit is updating the Cache interface based on a few concerns/aspects: - one of the argument on removing it is that the sotw protocol is stateless. It actually is not, and is explicitly visible in this PR: the lastResponse latched value as well as the version hack when the resource list is changed very well show that it is stateful - in cases outside of lds and cds, it is clearly stated that the control-plane is not expected to send back all resources when a single one change. This is highly critical in the case of ads with eds, as there is a likelihood of both having a lot of resources and a high churn. The implementation sending everything each time is technically tolerated, but is pushing on the data-plane work that could be done at low cost on the control-plane - the current support of both delta and sotw, for both linear and simple cache means that every change has to be done 4 times, with four very different implementations, each one supporting only a subset of use cases (e.g. linear does not support cds/lds as it doesn't handle wildcard in this case, and sotw does not work properly as resource list is improperly maintained). I strongly believed that both implementation should converge. The only remaining differences should be in the messages themselves, abstracted in the server part. Having a common interface for sotw and delta is a first step for this. It would also allow in the future to make snapshot far more efficient by supporting a linear version per snapshot In this context, this commit is changing: - creates a SubscriptionState within the cache package, driven by the data needed from the cache perspective to be able to answer a client request - simplifies streamState - makes it compatible with the new SubscriptionState interface - removes the IsFirst notion, as the protocol states it should come from the nonce in the request - removes the duplicate/parallel knownResourceNames and use the delta one instead. It will also properly carry the version in a future PR Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Rename KnownResources to ACKedResources to better reflect the change Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
5792964
to
cefeb71
Compare
I fully rebased the PR and replied to comments. I'd like this or #718 to move forward to be able to push major fixes on sotw and sotw-ads which are critical for xds-grpc |
5943db4
to
33d2fa9
Compare
33d2fa9
to
fff9714
Compare
…vior Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
fff9714
to
9341533
Compare
It has been raised that the addition of StreamState within the Cache interface is breaking the separation of concern between the server side managing the xds protocol itself and the cache which is only in charge of returning responses based on watches
PR #577 has been opened proposing to fully remove any client state from the CreateWatch interface
This PR is an alternate proposal based on a few concerns/aspects:
In this context, this PR is changing: