-
-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ListParams
union + filter logic for watcher
streams
#1148
Comments
I'm still digesting and thinking about a lot of this proposal, but some initial thoughts: The I notice that one of the main concerns you discuss here is performance impact on the cluster, but as a service developer one of our big concerns around this problem set is split brain. We could potentially fix that by being cleverer about how we manage the stores in a shared stream manager, possibly a slightly clever implementation on the current On the performance side I think that I'd err on the side of only handling the obvious and easy superset cases. If devs are worried about performance impacts of the watch streams we should provide good documentation on how to prime the shared store stream to minimise watches. e.g. If they have one watch for |
Yeah, the The go way is often interfacing with labels using
yeah, that is a worse problem. both I think are worth tackling though, and I think they are orthogonal concerns (which is why i also don't think we need to worry about this much in your PR - provided we don't expose all the internals early). |
Sorry, yes I didn't mean to disagree that performance is also worth tackling 😅 |
What problem are you trying to solve?
We are currently trying to solve the watcher stream re-use between multiple embedded controllers for #1080. A promising starting point has propped up in #1147 which will create a map that UNIQUELY identifies the pair of (
ListParams
,Namespace
) with a stream.This will work, and it will avoid doing multiple watches when the
ListParams
used by the different controllers are EXACTLY equal (e.g. no additional label selectors or field selectors). That is a helpful start, but it leaves a lot of potentially duplicated IO still on the table.This issue explores some options here.
Describe the solution you'd like
Create Set logic for
ListParams
inkube-core
so that given any number ofListParams
objects we can get the superset via something like:ListParams::union(&self, rhs: &Self) -> Result<Self>
This should be possible for the simple label-selector logic as it's only equals, not equals, in set, not in set - all of which can be unioned without needing more info.
For field-selector logic, it is not always possible to do minimally because users can, say, have two
ListParams
that filter on field-selectormetadata.namespace=X
and another withmetadata.namespace=Y
and we cannot simply take the union of that at the moment without dropping the requirement entirely. See kubernetes/kubernetes#72196. For such cases we would not want to watch the superset.If we can create a satisfactory superset, then we can also create a watcher that is barely general enough for either subscriber, and delegate to client-side filtering (i.e. allow the individual watch subscribers to filter the stream) before actually using data. We could thus;
WatchStreamExt::filter_by(lp: &ListParams) -> impl Stream...
Given the superset would be basically the union of constraints (or with conflicting constraints dropped), we know that all the subset data should be there by filtering by specific
ListParams
(the subsets used to make the union).Such a fn would need to loop over label selectors + field selectors, and produce options based on whether the selectors are satisfied for each object. It should possible to be fit neatly inside a single fn.
I think these two tasks can be achieved in a helpful manner, but we might not want to hard-integrate them into something like #1147 due to the constraints and caveats.
Selector Failure Cases
Cases where finding a minimal superset fails, can look like:
kubectl get deployments -A --field-selector metadata.namespace=kube-system
kubectl get deployments -A --field-selector metadata.namespace=othernamespace
A union of these would have need to effectively do an unconstrained watch across all namespaces to get the minimal superset (a potentially vastly more expensive IO operation than what is needed).
(Note it is technically possible to invert such a field selector if we knew the full list of namespaces and populated an extremely long string of namespaces we do not want. This is obviously impractical for automation though.)
For the namespace specific case, we need to consider these watches as two disjoint, and unsharable streams via
Api::namespaced
analogous to:kubectl get deployments -n kube-system
kubectl get deployments -n othernamespace
but there is ultimately a more general failure case here; two arbitrary field-selector labels being equal.
kubectl get deployments --field-selector metadata.name=controller-x
kubectl get deployments --field-selector metadata.name=controller-y
This case also needs to have two independent
Api::<Deployment>::namespaced
watcher streams against the same namespace unless we want to extend the the superset watcher stream to include all objects. Following the same argument as the namespace case, we should consider this as a failure to create a superset as well.Workarounds
If users wanted to avoid the error cases above, they could produce common labels for
controller-x
andcontroller-y
(say), and handcraft a supersetListParams
that uses just that label selector:-l=commonLabel=xy
The subsets in these cases could be written using specific field-selectors:
--field-selector metadata.name=controller-x
--field-selector metadata.name=controller-y
and these are clearly not analytical derivable from each other; the administrator would have to know this. However, you could take the superset
ListParams
and apply stream filtering client-side all the same using the subset params.The subsets could also be made entirely using label selectors (given administrative access to the cluster) to create a better "inverse relation" between superset and subsets:
-l='commonLabel=xy,specificLabel=x'
-l='commonLabel=xy,specificLabel=y'
whose specifc superset would be eactly:
'commonLabel=xy,specificLabel in (x,y)'
.Is is possible that all the awkwardness with field-selectors means we should only do the second part of this story (filtering), or that we should only allow label selectors for unioning.
Potential and different ways forward with this.
1. Do not support this behavior
In many cases, the administrators can construct smart label selectors to scope IO costs and work-around above issues, and if they can do this, they can probably create two completely disjoint label selectors using labelling.
In these cases, devs are already creating completely disjoint watcher sets, so are not gaining anything by doing client-side filtering. The cost of doing two long watches against different underlying resources should be roughly the same as one watch against the union.
However, some cases are wildly impractical to carve out via disjoint label sets; controllers that need to act on only a few namespaces in a multi-tenant environment is a big one; we cannot always expect controller devs to be able to label everything with administrative access.
2. Complete as helpers only
We could do this issue as stated, and leave integration outside the scope of shared stream managers. Thus we make the controller devs control and create subsets (when they can be made, if not then manually) rather than automatically through our shared manager.
This means the users can still create a single watch, and re-use across controllers, but at the cost of more complex user code, and another potential cost of a much more frequently triggering reconciler.
Sometimes this can be worked around (early exits in reconciler), but not always (we do not know if we can early exit in the reconciler if a child object caused the reconciliation - that information is deliberately hidden from the reconciler).
If we implement the union/filtering behaviour, but do not integrate it with a shared stream manager, then it would need quite a bit of documentation so users know how to get it right.
3. Complete then integrate
If we do this issue AND integrate it with the shared stream manager, then we have the chance of providing a smaller IO footprint for controller devs without too much controller complexity.
However, it comes with the cost of a more complex interface to the shared stream manager. It might also be a solution that is so heavy-weight that it is difficult to integrate. But I think it's worth at least exploring a little before we commit to a shared stream manager.
Target crate for feature
The text was updated successfully, but these errors were encountered: