Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backoff handling for watcher and Controller #703

Merged
merged 32 commits into from
Dec 21, 2021
Merged

Conversation

clux
Copy link
Member

@clux clux commented Nov 11, 2021

introduces a dependency on backoff 0.4.0 for parts of this.

old notes, no longer relevant not hiding errors

initial experiment with watcher + backoff. injecting a backoff into the main `stream::unfold` of `watcher` and sleeping between steps. It works with `configmapgen_controller` (in say the example where the crd has not been installed); it keeps sleeping until a max time hits (figured we'd set this to something like 10-30minutes) after which the stream ends.

Problems:

  • watcher bubbles up the error as it stands - so normal watcher users stop at the first error anyway
  • Controller ignores error events from watcher so at some point the Controller becomes dysfunctional
  • how to expose the parameter - imo, hide it behind a builder with sensible defaults - e.g. Cache: implement a builder wrapper for reflector #698 .

Solutions:
We could let the timeout grow almost indefinitely (to like an hour or more backoff) so that Controller can always deal with it.
We could also expose a new watcher error type that wraps the original - so that the user can ignore it easily, but not sure, what that interface would look like yet. Maybe this is something a builder can ignore by default in the way it presents the stream - i.e. ignore until the error has been going on for so lonng.

Also, should try to get the magic number baked into the backoff struct (it supports returning None). It didn't work with max_interval, but looks like it should work with https://docs.rs/backoff/0.3.0/backoff/exponential/struct.ExponentialBackoff.html
max_elapsed_time
- yup works.

Anyway, will try more later.

@clux

This comment has been minimized.

@clux clux requested a review from nightkr November 11, 2021 21:22
@clux clux linked an issue Nov 11, 2021 that may be closed by this pull request
@nightkr
Copy link
Member

nightkr commented Nov 12, 2021

Sorry about the slow reviewing, I've been (and still am) home sick. I'll try to have a closer look when I'm back alive...

@clux
Copy link
Member Author

clux commented Nov 12, 2021

Sorry about the slow reviewing, I've been (and still am) home sick. I'll try to have a closer look when I'm back alive...

No rush. Hope you get some rest and get to recover!

examples/configmapgen_controller.rs Outdated Show resolved Hide resolved
kube-runtime/src/observer.rs Outdated Show resolved Hide resolved
kube-runtime/src/observer.rs Outdated Show resolved Hide resolved
kube-runtime/src/observer.rs Outdated Show resolved Hide resolved
kube-runtime/src/observer.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/observer.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
clux added 7 commits November 16, 2021 19:47
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
awkward. will write a comment

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
clux added 2 commits November 19, 2021 00:16
jesus this stuff is hard.

Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
nightkr and others added 4 commits November 19, 2021 18:34
* Fix clippy warnings

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Reimplement watch backoff as FSM

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Remove useless lifetime bounds

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Silence clippy size warning

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Silence clippy properly this time around

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Split StreamBackoff into a separate utils module

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Backoff tests

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

* Add stream close test

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
@clux clux marked this pull request as ready for review November 19, 2021 23:53
@nightkr nightkr mentioned this pull request Dec 21, 2021
nightkr and others added 10 commits December 21, 2021 11:33
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
And we must all pay our due respects, or pay the price.

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
…eanup

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
@clux clux added the changelog-add changelog added category for prs label Dec 21, 2021
@clux clux added this to the 0.66.0 milestone Dec 21, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #703 (d02b1ec) into master (155859b) will increase coverage by 0.31%.
The diff coverage is 77.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #703      +/-   ##
==========================================
+ Coverage   70.83%   71.15%   +0.31%     
==========================================
  Files          52       54       +2     
  Lines        3405     3529     +124     
==========================================
+ Hits         2412     2511      +99     
- Misses        993     1018      +25     
Impacted Files Coverage Δ
kube-client/src/discovery/mod.rs 89.18% <ø> (ø)
kube-core/src/dynamic.rs 66.66% <ø> (ø)
kube-core/src/object.rs 80.00% <ø> (ø)
kube-core/src/params.rs 66.35% <ø> (ø)
kube-core/src/subresource.rs 75.72% <ø> (ø)
kube-runtime/src/controller/mod.rs 0.00% <0.00%> (ø)
kube-runtime/src/utils/mod.rs 17.64% <ø> (ø)
kube-runtime/src/watcher.rs 54.92% <0.00%> (-4.17%) ⬇️
kube/src/lib.rs 87.80% <ø> (ø)
kube-client/src/lib.rs 92.80% <25.00%> (-0.06%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 155859b...d02b1ec. Read the comment docs.

Comment on lines +33 to +41
impl<S: TryStream, B: Backoff> StreamBackoff<S, B> {
pub fn new(stream: S, backoff: B) -> Self {
Self {
stream,
backoff,
state: State::Awake,
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought on StreamBackoff: because this is effectively a stream analogue of https://docs.rs/backoff/0.4.0/backoff/fn.retry.html , maybe we should allow bailing at bad errors in a simiar way to how backoff.retry defines its error type: https://docs.rs/backoff/0.4.0/backoff/enum.Error.html ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe. I'd lean towards KISS for now, and letting @ihrwein decide the future plan in the upstreaming PR (ihrwein/backoff#50).

@clux clux changed the title implement backoff for watcher - for #577 Add backoff handling for watcher and Controller - for #577 Dec 21, 2021
Comment on lines 277 to 287
/// Pauses the stream for a while (as defined by `backoff`) after each [`Err`].
pub fn backoff_watch<K, B>(
stream: impl Stream<Item = Result<Event<K>>> + Send,
backoff: B,
) -> impl Stream<Item = Result<Event<K>>> + Send
where
K: Resource + Send,
B: Backoff + Send,
{
StreamBackoff::new(stream, backoff)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears unused now after the Observer cancel. I can remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was an attempt to be somewhat more ergonomic and point people who were already working in the watcher module towards it (rather than focusing on the implementation details of StreamBackoff itself). We can either drop this, or revert StreamBackoff::new to pub(crate) and fixing the examples to use backoff_watch instead.

I'm fine with either.

Copy link
Member Author

@clux clux Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it for now. Want to focus on the future chain pipeline before teaching people too much new stuff.

clux added 4 commits December 21, 2021 22:03
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
Signed-off-by: clux <sszynrae@gmail.com>
CHANGELOG.md Show resolved Hide resolved
@clux clux merged commit 276719c into master Dec 21, 2021
@clux clux deleted the backoff-watcher branch December 21, 2021 16:16
@clux clux added changelog-change changelog change category for prs and removed changelog-add changelog added category for prs labels Dec 22, 2021
@clux clux changed the title Add backoff handling for watcher and Controller - for #577 Add backoff handling for watcher and Controller Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-change changelog change category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add backoff to watcher reconnection
3 participants