Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pagination for watcher #913

Closed
xdatcloud opened this issue May 19, 2022 · 3 comments · Fixed by #1249
Closed

Support pagination for watcher #913

xdatcloud opened this issue May 19, 2022 · 3 comments · Fixed by #1249
Labels
discussions possibly more of a discussion piece than an issue runtime controller runtime related

Comments

@xdatcloud
Copy link

xdatcloud commented May 19, 2022

Would you like to work on this feature?

maybe

What problem are you trying to solve?

Hi all, first thanks to kube-rs! It help a lot of my works on kubernetes with rust.

I noticed the Reflector from client-go uses a Pager to retrieving large results sets in chunks during listing resources: https://github.com/kubernetes/client-go/blob/0bc005e72ff13ab4ceffd5c4e0ecb1774a7bf7f8/tools/cache/reflector.go#L274-L278

If it is possible to introduce pagination into kube-rs to reduce impact on Kubernetes API Server while fetching large results of resources?

Thanks for any suggestions!

Describe the solution you'd like

The official documents of Kubernetes describe the way to retrieving large results sets in chunks.

The Watcher from kube-runtime should take list parameter limit on first list request and deal with continue_token for later list requests, there are also some cases need to be handle, e.g. Http 410 Gone.

Describe alternatives you've considered

Follow the best practice on Kubernetes Guides.

Documentation, Adoption, Migration Strategy

No response

Target crate for feature

kube-runtime

@nightkr
Copy link
Member

nightkr commented May 19, 2022

Hi!

While technically this shouldn't be too hard to implement into the watcher, it would break some assumptions that we make downstream. We basically have two options for implementing this:

  1. Hide this inside the watcher and buffer up the Event::Restarted event
    • I'm not sure this helps a huge amount, all this really does is move the buffering from the API server to the client
    • However, it might help us stream the JSON parsing/deserialization, and load on the client might be preferable to load on the server
  2. Emit separate events for each chunk
    • This would allow streaming output to the user (for clients that care about this), but is a pretty massively breaking change for existing users, and introduces a lot of downstream complexity
    • Depending on how we implement this, reflector would either have to add its own internal buffering, or stop pruning deleted objects on resyncs

We might also be able to do a hybrid approach, where we introduce a new chunked_watcher, and reimplement watcher as a buffering layer on top of that...

@nightkr nightkr added runtime controller runtime related discussions possibly more of a discussion piece than an issue labels May 19, 2022
@clux
Copy link
Member

clux commented May 23, 2022

Based on the alternatives, I'm wondering if emiting an Event::RestartChunk might make sense here. The Event api is not super useful outside kube_runtime except niche use cases writing custom reflectors / timed stores.

I feel that emitting pages could allow faster consumption of items from the apiserver as they appear (at least from watcher's POV), and if backpressure works properly for us (which it might, but not sure), it could also limit how quickly the list pagination happens - lessening the load on both us and the apiserver in cases where we exit early.

It is possible that we can make reflector stores do a smart internal buffering such as:

  1. Event::RestartChunk(Vec<K>) appears from watcher
  2. Store immediately snapshots current ObjectRefs
  3. Store replaces elements in chunk with RestartChunk(Vec<K>) and clears corresponding entries from snapshotted objectrefs
  4. goto 1 until completed (not sure how we know if the chunk is the last chunk?)
  5. At the end of restarted, delete uncleared objectrefs from store

Provided we are able to learn when we receive the last page. It shouldn't use a whole lot more memory since we can just overwrite and keep a small list of things we have seen, and diff that at the end with what we didn't see (which must have been removed).

@goenning
Copy link
Contributor

goenning commented Aug 9, 2022

I'm also interested on this feature, so I could try submitting a PR if we have a consensus on the design.

But I'd start with the watcher only, not the reflector, as I don't have any experience with it, as well as making the scope of the PR much smaller.

How does that sound?

Edit: the chunked response from API server contains a continue value that is empty when it's the last page.

@clux clux linked a pull request Apr 29, 2023 that will close this issue
@clux clux linked a pull request Jul 13, 2023 that will close this issue
@github-project-automation github-project-automation bot moved this to Defining in Kube Roadmap Sep 13, 2023
@clux clux moved this from Defining to Done in Kube Roadmap Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussions possibly more of a discussion piece than an issue runtime controller runtime related
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants