You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The AverageIntervalRateLimiter causes tiny wait intervals which can result in DDOS at worst, or poor UX at best. See below where we implemented a 10k request/second/machine quota at 10:43 and saw requests fall into immediate retry loops and inundate the cluster:
The FixedIntervalRateLimiter causes large wait intervals which make it difficult to fully utilize a quota. See below where we were stuck only utilizing ~20% of our quota consistently. After restarting the cluster to use a FIRL with a 100ms refill interval we were able to achieve much better utilization:
As suggested above, this PR introduces support for a refill interval that is <= the TimeUnit of a FixedIntervalRateLimiter. This means that you can define a quota in a straightforward way, like 100MB/sec, while also acknowledging, for example, that you're willing to refill it every 100ms — suggesting that your retries for small/normal requests will often be ~100ms.
Simply set hbase.quota.rate.limiter.refill.interval.ms to your desired refill interval, and restart your RegionServers, to make use of this feature. By default the refill interval will just equal the TimeUnit, so this is a no-op without explicit configuration.
Here's an initial look at how a 100ms refill interval changed our wait interval percentiles in our QA environment:
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See https://issues.apache.org/jira/browse/HBASE-28453
The AverageIntervalRateLimiter causes tiny wait intervals which can result in DDOS at worst, or poor UX at best. See below where we implemented a 10k request/second/machine quota at 10:43 and saw requests fall into immediate retry loops and inundate the cluster:
The FixedIntervalRateLimiter causes large wait intervals which make it difficult to fully utilize a quota. See below where we were stuck only utilizing ~20% of our quota consistently. After restarting the cluster to use a FIRL with a 100ms refill interval we were able to achieve much better utilization:
As suggested above, this PR introduces support for a refill interval that is <= the TimeUnit of a FixedIntervalRateLimiter. This means that you can define a quota in a straightforward way, like 100MB/sec, while also acknowledging, for example, that you're willing to refill it every 100ms — suggesting that your retries for small/normal requests will often be ~100ms.
Simply set
hbase.quota.rate.limiter.refill.interval.ms
to your desired refill interval, and restart your RegionServers, to make use of this feature. By default the refill interval will just equal the TimeUnit, so this is a no-op without explicit configuration.Here's an initial look at how a 100ms refill interval changed our wait interval percentiles in our QA environment:
@hgromer @eab148 @bozzkar @bbeaudreault