Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce support for persistent metadata watches #1145

Merged
merged 17 commits into from
Feb 23, 2023

Conversation

mateiidavid
Copy link
Contributor

Motivation

The watch (and watch_metadata respectively) functions on the Api type are fallible, and watches are not recovered. Errors may happen for any reason, such as network induced errors, restarts (etcd can only cache so many resource versions), and so on. To get around these failures, we have a watcher() utility in the runtime crate that manages the underlying stream in a persistent way, recovering on failure.

We should support the same type of behaviour, but for metadata-only responses. This change adds support for setting up persistent watches for metadata only, by using a similar utility (watch_metadata, because clippy complained about metadata_watcher).

Solution

To implement this, we simply need a new function that creates a metadata specific stream. We also need to ensure that call list_metadata and watch_metadata in step_trampolined instead of list and watch when driving the watcher.

This ended up being a bit of a longer solution than I had anticipated at first. As far as the solution goes, here is what I kept in mind and what I strove to optimise for:

  • No breaking changes to the API. Ideally, this PR shouldn't introduce a change that forces adopters to change their code, so primarily no changes to the signature of watcher
  • Ensure the solution keeps the code maintainable: probably the hardest part. This whole solution could've been done through a lot of duplication but that's going to be super unhelpful in keeping the code tidy and maintainable in the future, ideally when a change is made, it shouldn't be reflected across 2 similar functions.
  • Ensure the solution is clean and idiomatic

On Discord, we had a back and forth where we thought the best thing here would be to specialize through a trait. If we were to use indirection in step_trampoline, we could call list and watch on the trait, which would allow us to specialize for Api<PartialObjectMeta>. This wasn't a huge success since we can't implement the trait for Api twice (once for Api<K> and once for Api<PartialObjectMeta>).

The next best thing that would allow us to keep everything the same was to try and specialize through closures. The easiest way has been to split step_trampoline up into smaller pieces. We duplicate step. Inside each step function we use a factory to create two closures that will represent parts of the watcher state machine API: one to list, and one to watch.

This allows us to keep most of the logic generic and special-case only a small part of the code. We introduce a new AsyncFn trait that allows the factories to have a nested impl return type impl Fn(..) -> impl Future<..>.

Original commit message

Introduce support for persistent metadata watches

The `watch` (and `watch_metadata` respectively) functions on the Api
type are fallible, and watches are not recovered. Errors may happen for
any reason, such as network induced errors, restarts (etcd can only
cache so many resourve versions), and so on. To get around these
failures, we have a `watcher()` utility in the runtime crate that
manages the underlying stream in a persistent way, recovering on
failure.

This change introduces support for persistent metadata watches, through
a `metadata_watcher` function in the same crate. Watches may be
established on any type of resources, the main difference is that the
returned types no longer correspond to the type of the Api. Instead,
a concrete metadata type is returned.

To support this with no breaking changes and to allow for more maintable
code, a few utility functions and traits are introduced in the `runtime`
crate.

The `watch` (and `watch_metadata` respectively) functions on the Api
type are fallible, and watches are not recovered. Errors may happen for
any reason, such as network induced errors, restarts (etcd can only
cache so many resourve versions), and so on. To get around these
failures, we have a `watcher()` utility in the runtime crate that
manages the underlying stream in a persistent way, recovering on
failure.

This change introduces support for persistent metadata watches, through
a `metadata_watcher` function in the same crate. Watches may be
established on any type of resources, the main difference is that the
returned types no longer correspond to the type of the Api. Instead,
a concrete metadata type is returned.

To support this with no breaking changes and to allow for more maintable
code, a few utility functions and traits are introduced in the `runtime`
crate.

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2023

Codecov Report

Merging #1145 (7b18318) into main (53ad9ee) will decrease coverage by 0.24%.
The diff coverage is 34.78%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1145      +/-   ##
==========================================
- Coverage   72.79%   72.56%   -0.24%     
==========================================
  Files          66       66              
  Lines        5066     5085      +19     
==========================================
+ Hits         3688     3690       +2     
- Misses       1378     1395      +17     
Impacted Files Coverage Δ
kube-runtime/src/controller/mod.rs 36.95% <0.00%> (-1.17%) ⬇️
kube-runtime/src/watcher.rs 41.50% <50.00%> (-2.11%) ⬇️
kube-runtime/src/wait.rs 75.47% <0.00%> (-1.89%) ⬇️

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
@clux clux added the changelog-add changelog added category for prs label Feb 20, 2023
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this as a non-breaking solution. You've kept the conventions the same, and also actually broken down large pieces of logic in the process. It is a bit more complex at bits, but the trait def and impl at the top helps alleviate that quite a bit. 👍

Left some comments on tests, and nits on msrv, names, but this is great!

kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
examples/dynamic_watcher.rs Outdated Show resolved Hide resolved
examples/dynamic_watcher.rs Outdated Show resolved Hide resolved
@clux clux added this to the 0.79.0 milestone Feb 21, 2023
@nightkr
Copy link
Member

nightkr commented Feb 21, 2023

Feature-wise this definitely makes sense to do. I'm not quite so sold on the StepFn implementation, but that might just be that I need some time to sit down with it and understand it...

@nightkr
Copy link
Member

nightkr commented Feb 21, 2023

It looks like we can get away with relatively minimal changes by introducing a wrapper around Api: https://gist.github.com/nightkr/eca3455160be68eaf616750eb6158519 (patch against the current main, 53ad9ee). The downside is that we introduce an extra box for each Api call, which isn't ideal (but also relatively cheap compared to everything else an Api call implies).

@mateiidavid
Copy link
Contributor Author

@nightkr suggested changes look really simple and clean, I prefer it over my approach. I think we discussed using async_trait on Discord. I was kind of hesitant to introduce any new dependencies to the runtime crate and I tried to do something similar to your suggestion by manually writing out the types (never got anywhere).

Will start changing things around.

@mateiidavid mateiidavid requested review from clux and nightkr and removed request for clux February 21, 2023 20:09
@mateiidavid
Copy link
Contributor Author

mateiidavid commented Feb 21, 2023

Okay, made the changes, I think it's looking good. Left some questions on that gist -- did not want to clutter up the PR -- was curious about how we handle the stream since I haven't worked with async_trait (I feel like I've said this phrase quite often in this PR :D ).

Anyway, ready for a re-review whenever either or both of you get a chance. Thanks a lot for your time!

@nightkr
Copy link
Member

nightkr commented Feb 21, 2023

Looks like some examples still need to be updated.

was curious about how we handle the stream since I haven't worked with async_trait (I feel like I've said this phrase quite often in this PR :D ).

async_trait is basically just syntax sugar for boxing the futures behind the scenes. Ideally it should become a no-op once rustc itself supports async fns in traits.

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Matei David <matei@buoyant.io>
Copy link
Member

@nightkr nightkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super happy that we've gotten this to the point here where the use case is now a drop-in replacement. Great work!

@clux clux merged commit b822ee6 into kube-rs:main Feb 23, 2023
jmintb pushed a commit to jmintb/kube-rs that referenced this pull request Mar 9, 2023
* Introduce support for persistent metadata watches

The `watch` (and `watch_metadata` respectively) functions on the Api
type are fallible, and watches are not recovered. Errors may happen for
any reason, such as network induced errors, restarts (etcd can only
cache so many resourve versions), and so on. To get around these
failures, we have a `watcher()` utility in the runtime crate that
manages the underlying stream in a persistent way, recovering on
failure.

This change introduces support for persistent metadata watches, through
a `metadata_watcher` function in the same crate. Watches may be
established on any type of resources, the main difference is that the
returned types no longer correspond to the type of the Api. Instead,
a concrete metadata type is returned.

To support this with no breaking changes and to allow for more maintable
code, a few utility functions and traits are introduced in the `runtime`
crate.

Signed-off-by: Matei David <matei@buoyant.io>

* Run clippy

Signed-off-by: Matei David <matei@buoyant.io>

* Make closure arg generic

Signed-off-by: Matei David <matei@buoyant.io>

* Fix doc test

Signed-off-by: Matei David <matei@buoyant.io>

* Bump MSRV to 1.63.0

Signed-off-by: Matei David <matei@buoyant.io>

* Rename AsyncFn to StepFn

Signed-off-by: Matei David <matei@buoyant.io>

* Add a compile-time typecheck and a meta example to dynamic watcher

Signed-off-by: Matei David <matei@buoyant.io>

* Rename watch_metadata to metadata_watcher and allow module rep

Signed-off-by: Matei David <matei@buoyant.io>

* Add trait to specialize Api calls instead of relying on closures

Signed-off-by: Matei David <matei@buoyant.io>

* Change meta watcher fn name in example

Signed-off-by: Matei David <matei@buoyant.io>

* Parse evar as 1

Signed-off-by: Matei David <matei@buoyant.io>

* Refactor dynamic_watcher example

Signed-off-by: Matei David <matei@buoyant.io>

---------

Signed-off-by: Matei David <matei@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-add changelog added category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants