-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ingester] Optionally delay circuit breaker activation until after traffic starts #9842
Conversation
89cb8b2
to
61f102b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating these docs! I massaged the wording a little bit.
# (experimental) How long after an initial request an activated circuit | ||
# breaker should wait before becoming effectively active. During this time | ||
# neither failures nor successes will not be counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# (experimental) How long after an initial request an activated circuit | |
# breaker should wait before becoming effectively active. During this time | |
# neither failures nor successes will not be counted. | |
# (experimental) Duration, in seconds, after an initial request an activated circuit | |
# breaker should wait before becoming effectively active. During this time, | |
# neither failures nor successes are counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 24a3b0d
# failures and successes will not be counted. | ||
# (experimental) How long after an initial request an activated circuit | ||
# breaker should wait before becoming effectively active. During this time | ||
# neither failures nor successes will not be counted. | ||
# CLI flag: -ingester.read-circuit-breaker.initial-delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 24a3b0d
fa83158
to
6a54450
Compare
…eueing an activation
92bff77
to
141c3b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of nits
|
||
// cancel any pending activation | ||
if cb.activationTimer != nil { | ||
cb.activationTimer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here we rely on the fact that within time.AfterFunc()
we actually set cb.activationTimer
to nil
.
Are we sure it can never happen that we reach this line while time.AfterFunc()
is executing?
I personally prefer usage of time.NewTimer()
because then time.Stop()
's result (true
or false
) clearly explains what happened. Here false
means that the timer expired, and the function passed to time.AfterFun()
was started in a separate go routine, but doesn't say anything about whether the function was completed or not.
Anyway, if we opt for this approach, since it is crucial to execute cb.activationTimer = nil
, I would put a comment on both lines that execute it (251 and 284) to highlight that those lines must not be deleted or moved from those places, because that would break the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are here we are holding stateMtx
, which is also taken at the start of the execution of the AfterFunc()
, so if we are here while that function is executing, it is waiting on the mutex. once it gets the mutex it will see that the state is not circuitBreakerActivating
and exit.
i believe these are the possible situations:
- the timer hasn't yet expired, and we successfully
Stop()
it here -- the return value would betrue
- the timer expired and the launched goroutine completed some time before
deactivate()
was called -- in this case the timer will benil
(because the timer function cleared it) and we won't callStop()
at all - the timer expired and the launched goroutine is currently executing when
deactivate()
is called -- in this casedeactivate()
will block because the timer goroutine is holdingstateMtx
, and then we'll be in the same state as (2) - the timer expired and launched the goroutine while
deactivate()
was executing -- in this case we would callStop()
and it would returnfalse
; the launched goroutine would be waiting on thestateMtx
, and afterdeactivate()
completes, would get it, and exit because the state is no longercircuitBreakerActivating
i don't believe the lines that set the timer to nil
are strictly necessary. if we removed then, then we would just additionally call Stop()
on an expired timer in cases (2) and (3) above.
i opted not to use time.NewTimer()
to avoid launching a separate goroutine to poll the timer channel (and have to cancel when we Stop()
the timer, since otherwise i think it would block forever)
@@ -4,6 +4,7 @@ package ingester | |||
|
|||
import ( | |||
"context" | |||
"errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is feasible to add a test that would simulate what happens if we:
- at timestamp
ts
: schedule a cb activation with delay3*d
- at timestamp
ts + d
: deactivate the cb - at timestamp
ts + 2d
: schedule another cb activation with delay3*d
- ensure that the cb is not active before timestamp
ts + 5d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in c67e4e3
95ae3c0
to
c67e4e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you @pr00se
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-9842-to-r316 origin/r316
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x c56a74829dd8135a4b3a93ebf5f5222c4f1ac117
# Push it to GitHub
git push --set-upstream origin backport-9842-to-r316
git switch main
# Remove the local backport branch
git branch -D backport-9842-to-r316 Then, create a pull request where the |
…affic starts (#9842) * Delay push circuit breaker activation until after traffic arrives * make reference-help doc * Docs changes * Fix nil pointer panics * Update CHANGELOG * Make activating trinary to fix bug with requests while deactivated queueing an activation * Use a mutex to synchronize breaker state traansitions * Remove redundant checks in tests * Add test (cherry picked from commit c56a748)
…affic starts (#9842) (#9895) * Delay push circuit breaker activation until after traffic arrives * make reference-help doc * Docs changes * Fix nil pointer panics * Update CHANGELOG * Make activating trinary to fix bug with requests while deactivated queueing an activation * Use a mutex to synchronize breaker state traansitions * Remove redundant checks in tests * Add test (cherry picked from commit c56a748)
What this PR does
Change the implementation of the
--initial-delay
option for ingester circuit breakers as follows:tryAcquirePermit()
rather than the call toactivate()
. This lets us ignore the firstinitial_delay
of traffic, regardless of how long after activation it arrives.activate()
is called during the activation delay, don't schedule another delayed activationdeactivate()
is called during the activation delay, cancel any pending activationWhich issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.