Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor per stream rate limit #4213

Merged
merged 7 commits into from
Aug 25, 2021
Merged

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Aug 24, 2021

This PR comes after a lot of debugging and discovering some unexpected behavior in our underlying rate limit library. Notably, it did not handle transitioning from rate.Inf as expected. This PR refactors and corrects to desired behavior and reduces some complexity/runtime costs by no longer depending on the per tenant rate limiter from Cortex.

edit:

Further explanation from the library docs, as a special case:
A zero Burst allows no events, unless limit == Inf.

However, a zero burst value also causes the token bucket to never fill. This presents a problem when updating the limiter's parameters during runtime. The limiter will reconfigure itself properly, but as it didn't have any tokens prior, it will reject the first request as it has no available budget. This is why we see issues during ingester rollout: only series which were present in WAL replay and thus initialized with NewLimiter(rate.Inf, 0) are affected. I've also included a test illustrating the behavior on the underlying library.

@owen-d owen-d requested a review from a team as a code owner August 24, 2021 21:49
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but from the current comments I still don't entirely understand the bug in the upstream rate limit library.

I see that rate.Inf is a special case of the Limit function: https://cs.opensource.google/go/x/time/+/1f47c861:rate/rate.go;l=22

Are you saying the issue is in the actual advance function: https://cs.opensource.google/go/x/time/+/1f47c861:rate/rate.go;l=361

Extending the comment in AllowN would be useful IMO.

Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@owen-d owen-d merged commit 668622c into grafana:main Aug 25, 2021
@owen-d
Copy link
Member Author

owen-d commented Aug 30, 2021

ref #1544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants