Watcher resync causes memory usage burst #1209

nightkr · 2023-04-28T12:03:20Z

Would you like to work on this feature?

None

What problem are you trying to solve?

During a resync all the data must be stored twice (both in JSON and k8s-openapi struct forms), which (for larger clusters) causes the controller infrastructure to consume ~twice (for the initial sync, expect ~3x for later syncs since the old state is also still kept around) as much memory as normal.

This is both a problem in itself (since it can cause the controller to hit memory quotas during these bursts), and for long-term memory usage (since many memory allocators, including glibc and jemalloc, never unmap memory once it has been allocated for the process).

Describe the solution you'd like

We can use paginated lists to minimize the overlap window. However, must be controlled by the user as this makes network traffic chattier, which may cause degradations on high-latency networks.

Describe alternatives you've considered

We can also use the new ?watch=true&sendInitialEvents=true API, which implements our list-then-watch pattern on the server side. I'd imagine that this would be ideal for us in most regards. However, this is still an alpha-level API, and would cause silent failures in older clusters.

Documentation, Adoption, Migration Strategy

We can implement chunking transparently in watcher, which would let us at least cut down the window of JSON that we need to parse at any given time. An "even better" solution here would be to implement a new watcher API that supports streaming resync. We should then be able to reimplement the current watcher API as a facade on top of that without major regressions compared to the current implementation. However, we would still need to consider whether that new streaming-resync watcher would be meaningfully helpful for Controller/reflector.

Target crate for feature

kube-runtime

The text was updated successfully, but these errors were encountered:

clux · 2023-04-28T15:06:06Z

Yeah, I imagine we would want to advocate for a sendInitialEvents based solution long term, but it was just released in 1.27. We could implement it before then, but we'd have to version/feature flag it to avoid clashing with MSRV. Still, it's something that would be good to see.

The alternative of paginating internally in the watcher is also sensible, we could make a fixed number like 50 a default page (and expose in watcher::Config for people needing to tweak), but this also breaks the atomicity of Restarted events. Then again so does sendInitialEvents, so maybe the Restarted concept needs rethinking anyway?

If we just do sendInitialEvents, then that should allow us to kill flattening as a concept, as it's all single objects being passed around afaikt. Maybe it's worth starting to feature flag a send initial based watcher now anyway since this is a dev level opt-in that may require changes to watcher::Event/reflector interface.

NB: sendInitialEvents is on the main page now: https://kubernetes.io/docs/reference/using-api/api-concepts/#streaming-lists

nightkr · 2023-04-28T15:13:06Z

Even in 1.27 SIE is just an alpha-level feature, which IIRC means that the feature gate is disabled by default.

The alternative of paginating internally in the watcher is also sensible, we could make a fixed number like 50 a default page (and expose in watcher::Config for people needing to tweak), but this also breaks the atomicity of Restarted events.

It depends. We could change the interface to something like:

enum StreamingWatcherEvent<K> {
    Added(K),
    Removed(K),
    RestartBegin,
    RestartPage(Vec<K>),
    RestartDone,
}

That would still let us keep a facade that buffers it down to the existing watcher::Event type. I don't think it's worth forcing this API change on all users since it's a decent chunk of extra complexity. It should also be able to cover both SIE and paginated lists.

See kube-rs#1209, relatively untested prototype and doesn't include any proposed API changes

* Paginate initial watch list See #1209, relatively untested prototype and doesn't include any proposed API changes * Remove managed_fields stripper Users can do this anyway using for_stream * Make page size configurable * initial clippy fixes Signed-off-by: clux <sszynrae@gmail.com> * more unrelated clippy fixes Signed-off-by: clux <sszynrae@gmail.com> * clippy fixes related to this pr Signed-off-by: clux <sszynrae@gmail.com> * align default pagination size with client-go (500) Signed-off-by: clux <sszynrae@gmail.com> * caveat the page example + rename to page_size name matches what it's called upstream + it's shorter Signed-off-by: clux <sszynrae@gmail.com> * add a new mock test system for the runtime Signed-off-by: clux <sszynrae@gmail.com> * remove unnecessary boxing + slight rename Signed-off-by: clux <sszynrae@gmail.com> --------- Signed-off-by: clux <sszynrae@gmail.com> Co-authored-by: Natalie Klestrup Röijezon <nat@nullable.se>

clux · 2023-07-14T10:12:48Z

The initial watcher pagination implementation is included in 0.84, and it should lessen the impact on upstream apiservers (and allow watchers on more extreme cluster layouts). That said, increased memory use from having to cache a full list (as this issue is about) while receiving data will still happen (to a lesser extent).

It could still make sense to add a Page((Vec<K>, continue: bool)) (or Page(Vec<K>) + ListComplete) event to watcher::Event to at least allow streaming watchers of large resource lists without having to buffer all the pages.

But there are some downsides downstream for doing this (on top of the api change/complexity Nat mentioned)

the reflector store will still have to do the buffering to avoid Store blindspots (times where we only have parts of the data) - meaning for atomic style stores (like ours is) you always have to do the buffering somewhere
doing slow work while handling pages could also lead to pages getting invalidated (causing us to have to re-list) - this is probably not a big deal in most cases though

regardless, avoiding the internal buffering in the future would benefit things not using reflectors.

so, in a future world - like 1/2y from now where sendInitialEvents is everywhere - there is potentially also no need for a list call. Having a ListComplete equivalent could be a good way to signal the SIE bookmark that indicates the end of the intial events, so that reflector stores can do buffering, and maybe that could be a stepping-stone?

clux · 2023-09-08T16:24:18Z

As of 0.86 we have support for both pagination and sendInitialEvents (when the feature is available and opted-in to), so closing this.

(There might be smarter improvements on being able to return data to the user faster rather than buffer in watcher until a full list have completed, but such an upgrade would need to consider reflector usage which needs to wait for its atomic store property - so leaving something like that to potential future proposal.)

nightkr · 2023-09-09T01:34:02Z

the reflector store will still have to do the buffering to avoid Store blindspots (times where we only have parts of the data) - meaning for atomic style stores (like ours is) you always have to do the buffering somewhere

I don't think we strictly need this, just a list at the end of which objects to keep/delete. We never guarantee that the reflector is atomic.

clux · 2023-09-09T11:09:40Z

are you suggesting that we could forward events from the initial-lists, and have the store keep track of elements that it got during the initial-list, so that when the initial-list is complete, it could delete the complement itself?

we do currently say that Restarted is a signal to do an atomic replace, and that watcher is an atomic interface for a state store

nightkr · 2023-09-09T12:36:00Z

are you suggesting that we could forward events from the initial-lists, and have the store keep track of elements that it got during the initial-list, so that when the initial-list is complete, it could delete the complement itself?

Yes.

we do currently say that Restarted is a signal to do an atomic replace

I think this is a point-of-view thing. watcher currently guarantees it, but reflector doesn't depend strictly on that guarantee.

clux · 2023-09-09T13:45:15Z

yeah, that makes sense. less double buffering, and better memory profile. 👍

probably needs some care for the store readiness check, and some care to work for both paged and streaming lists, but worth raising a help-wanted issue for (at the very least).

nightkr added bug Something isn't working runtime controller runtime related labels Apr 28, 2023

nightkr added a commit to nightkr/kube-rs that referenced this issue Apr 28, 2023

Paginate initial watch list

5543698

See kube-rs#1209, relatively untested prototype and doesn't include any proposed API changes

nightkr mentioned this issue Apr 28, 2023

Paginate initial watch list #1211

Closed

clux linked a pull request Apr 29, 2023 that will close this issue

Paginate initial watch list #1211

Closed

nightkr mentioned this issue May 2, 2023

Lower memory usage of commons-operator stackabletech/commons-operator#154

Open

clux linked a pull request Jul 13, 2023 that will close this issue

Add default pagination to watcher #1249

Merged

clux closed this as completed in #1249 Jul 13, 2023

clux reopened this Jul 14, 2023

clux mentioned this issue Jul 15, 2023

Add streaming list support to WatchParams #1254

Closed

rvql mentioned this issue Jul 21, 2023

[AGENT] Update kube-rs deepflowio/deepflow#3683

Merged

clux closed this as completed Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watcher resync causes memory usage burst #1209

Watcher resync causes memory usage burst #1209

nightkr commented Apr 28, 2023

clux commented Apr 28, 2023

nightkr commented Apr 28, 2023

clux commented Jul 14, 2023

clux commented Sep 8, 2023

nightkr commented Sep 9, 2023

clux commented Sep 9, 2023

nightkr commented Sep 9, 2023

clux commented Sep 9, 2023

Watcher resync causes memory usage burst #1209

Watcher resync causes memory usage burst #1209

Comments

nightkr commented Apr 28, 2023

Would you like to work on this feature?

What problem are you trying to solve?

Describe the solution you'd like

Describe alternatives you've considered

Documentation, Adoption, Migration Strategy

Target crate for feature

clux commented Apr 28, 2023

nightkr commented Apr 28, 2023

clux commented Jul 14, 2023

clux commented Sep 8, 2023

nightkr commented Sep 9, 2023

clux commented Sep 9, 2023

nightkr commented Sep 9, 2023

clux commented Sep 9, 2023