Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global ratelimiter fix: prevent low-weight hosts from becoming unusable #6196

Closed

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Jul 30, 2024

From a problem suspected all along but rather conclusively shown in production:
Intermittent or mildly spiking requests are assigned a low weight, leading to rejected requests even when well below intended limits.

As a pathological example, given 1,000 RPS:

  • Constant use of ~100/s going to some hosts
  • A cronjob somewhere does like 10 rps every dozen or two seconds, to different hosts

This leads to the constant-receiving hosts having something like 99.5+% of the weight of the cluster,
because the cronjob host gets <1 RPS inferred by the time it tries to send more requests.
Since its allowance is so low, it rejects like 9/10ths of the requests, despite there being tons of room left in the quota.

We don't have a lot of these, but we do have a few, and they look quite wrong at a glance and are a bit hard to understand.
That seems worth trying to address before widespread use, even if it makes the whole system a bit more complicated.


Included in this PR are tests of 3 algorithms:

  • The current one
  • The one I want to implement
  • An idea I had along the way but I do not like
    • I'm happy to delete this if we don't think it's worth the amount of code involved

I have not yet implemented it because I figured we'd talk about the results first and decide if more time should be spent
trying to find other approaches.

If we do move forward with it, I see a few possible implementations:

  1. Add "target RPS" dynamicconfig to aggregators, have them return the same weight-data but adjusted to account for the "low-weight boost"
    • I've been avoiding putting this in aggregators to keep it more "pure"... but that isn't particularly important.
    • It'd be a smallish change, but "adjust weight by target, return as weight to limiter, which converts back to target" feels odd.
  2. Add "used RPS" to aggregators' response data, so the limiter can do this calculation locally
    • A simple IDL change would be needed.
    • Since this is not "used" yet I'm inclined to just leave it incompatible and "break" the existing one. Then we can verify that it
      behaves reasonably and uses the fallback logic like it should.
    • This is probably the most "normal" way to implement this. It'll look fairly reasonable in the end.
  3. Implement the pluggable / not-hardcoded API for all this, so we can have both algorithms side by side and compare / shadow / etc.
    • I think we'll want this eventually. I doubt this is the ideal algorithm, even if I am relatively confident it's better.
    • I don't think it's worth it quite yet. This is still "v0.9" territory and not worth keeping the old one if we have something
      we think is better in ~all scenarios.
    • Clearly the most amount of work of the three options, mostly due to needing to add a plugin system and plugin-lookups all over.
      I do not think it's going to be difficult or risky, just "the most work".

I'm inclined to do 2 for "don't leave the old impl, and test the failure paths" purposes, but honestly I'm up for any of them.
None feel like a bad choice since I've already found all the code-paths where things need to go, when building this out the first time,
and I've been thinking about what I'd need to do for the remaining bits since near the beginning.

Comment on lines +133 to +136
// find out how many would've been accepted and track it (uses previous-round target-RPS because limiters always lag by an update cycle)
track(&results.accepts[0], &results.rejects[0], r[0], results.targetRPS[0])
track(&results.accepts[1], &results.rejects[1], r[1], results.targetRPS[1])
track(&results.accepts[2], &results.rejects[2], r[2], results.targetRPS[2])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering changing these to use time-mocked real-ratelimiters and spreading requests evenly across each iteration... but I haven't yet tried, and I suspect it'll look a bit more complicated.

It would however give this "free" accept/reject tracking (just wrap them in a counted limiter) and realistic token-bucket-size usage and RPS carryover, as right now it unfairly penalizes e.g. 0.99rps as rejecting all requests because it's never over 1.


Any votes? I'm kinda curious how it'll affect the numbers...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proof of concept anyway: #6197
changing results are commented out and left in place.

tl;dr implementing it in a less-bad way will take a bit more work, but:

  • most of the results are the same
  • most things perform slightly better
  • proposed new algorithm does noticeably better, more ideal results

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time-mocked should be a better option since, on CI, time is very relative, and it can take more time to settle RPS, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both use fake-time, that PoC is to use a clock.Ratelimiter to more-accurately keep track of allows/rejects. it's mostly for fractional values, since my fake-counting is essentially only correct on integers.

Comment on lines +223 to +227
t.Run("v1 nonzero tweak - running average ignoring zeros", func(t *testing.T) {
// an idea to improve on the original: hosts never see their own degraded values, but others do.
// this allows others to act as if the host is idle (when it has been recently), but allows a periodic spike to go through.
//
// this has not been deployed, it was just an idea that IMO did not pan out.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the "failed idea I don't intend to build".
if we don't think this is worth keeping here because of that, I'm entirely game to just delete it. I wrote it mostly for myself anyway.

if it gives anyone a better idea, I'd be curious to hear the details and try it, but I think it needs to be smarter about not just zeros as "imbalanced low usage" is part of the problem, not just zero/non-zero toggling.

Comment on lines +168 to +187
t.Run("zeros", func(t *testing.T) {
// zeros are the worst-case scenario here, greatly favoring the non-zero values.
check(t, data{
targetRPS: [3]float64{0.63, 3.57, 10.7}, // intermittent allows far too few, constant is allowed over 2x what it uses
callRPS: [3]float64{1.25, 2.50, 5.00}, // all at or below per-host limit at all times, not just in aggregate
accepts: [3]float64{0.00, 1.50, 5.00}, // this is the main problems with this algorithm: bad behavior when imbalanced but low usage.
rejects: [3]float64{1.25, 1.00, 0.00},
cumulativeRPS: targetRPS, // but this is useful and simple: it always tries to enforce the "real" target, never over or under.
}, run(t, roundSchedule(0, 5)))
})
t.Run("low non-zeros", func(t *testing.T) {
// even 1 instead of 0 does quite a lot better, though still not great
check(t, data{
targetRPS: [3]float64{2.21, 4.07, 8.72},
callRPS: [3]float64{2.00, 3.00, 5.00},
accepts: [3]float64{1.25, 2.25, 5.00}, // constant use behaves well, others are not great
rejects: [3]float64{0.75, 0.75, 0.00},
cumulativeRPS: targetRPS,
}, run(t, roundSchedule(1, 5)))
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of understood the problem from the description, but I cannot understand it from the test description.

Comment on lines +421 to +439
t.Run("zeros", func(t *testing.T) {
check(t, data{
targetRPS: [3]float64{3.38, 5.00, 10.7},
callRPS: [3]float64{1.25, 2.50, 5.00},
accepts: [3]float64{0.75, 2.50, 5.00}, // very close, a main motivating reason for this approach
rejects: [3]float64{0.50, 0.00, 0.00},
cumulativeRPS: targetRPS + 4.1, // when very low, allows moderate bit above
}, run(t, roundSchedule(0, 5)))
})
t.Run("low nonzeros", func(t *testing.T) {

check(t, data{
targetRPS: [3]float64{4.34, 5.00, 8.72},
callRPS: [3]float64{2.00, 3.00, 5.00},
accepts: [3]float64{1.75, 3.00, 5.00}, // almost perfect, a main motivating reason for this approach
rejects: [3]float64{0.25, 0.00, 0.00},
cumulativeRPS: targetRPS + 3.1, // less overage but still moderate
}, run(t, roundSchedule(1, 5)))
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is my proposed solution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how a test scenario is a solution?

Copy link
Member Author

@Groxx Groxx Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fewer rejects, more accepts, without (significantly) exceeding the intended rps. basically. it's a simulation of ~4 processes, per test.

there's not much in here that's worth asserting, so it's just checking "last round summary" values. the logs contain details of how it gets to this state and what the overall behavior is, which is WAY too much to assert against.

if you run any of these and look at the output, you can see the number of calls being performed per "second", how that updates weights in the system, and what the resulting RPS it allows for the next second.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that it is not a solution.
As far as I understand, you don't change any code logic; you just clarify the edge case scenario. Or am I missing something?

Copy link

codecov bot commented Jul 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.86%. Comparing base (e60d167) to head (16cc64b).
Report is 37 commits behind head on master.

Additional details and impacted files

see 8 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60d167...16cc64b. Read the comment docs.

@davidporter-id-au
Copy link
Member

100/s going to some hosts

sorry, all the context fell out my brain again. Define 'hosts' here? Are you referring to frontend, history aggregator or customer worker hosts.

@Groxx
Copy link
Member Author

Groxx commented Jul 30, 2024

100/s going to some hosts

sorry, all the context fell out my brain again. Define 'hosts' here? Are you referring to frontend, history aggregator or customer worker hosts.

This would be for frontend hosts / the ones doing the limiting.

We've got a few cases that look like general background activity that's relatively steady both in frequency and in what frontend hosts it contacts, and then a low frequency / random ping from other machines that hits the same limit.

E.g. workflows doing their thing in one service, plus an occasional third party service that sends a small burst. That third party has a completely different routing layout, so when they hit frontends that don't overlap they get small weights that degrade quickly to even tinier weights... until most of their requests get rejected for no apparent reason, because they don't recover enough tokens with their evenly-shared 0.1RPS to do anything useful.

It's "fair" because those hosts really do receive a vanishingly small amount of load comparatively (plus they're idle for a while), but it's unfair given our goal of "you can send X RPS". This is an attempt to find a middle ground that allows those small fair bursts without allowing giant surprise bursts.

Hopefully my run-on late-night sentences make sense in the morning :]

Copy link
Contributor

@3vilhamster 3vilhamster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is experimented, I think it is find to try out different scenarios, but I'd like to have in writing the end strategy and a list of problems somewhere.

All are bound to a single key across 3 hosts with a simple 15rps target (5 per host when balanced) (3 hosts chosen because there are many easy mistakes with 2).
*/
const targetRPS = 15
max := func(a, b float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: math.Max

}
return b
}
min := func(a, b float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: math.Min

}
return b
}
_, _ = max, min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions could be outside of the test body, so this is not required.


type data struct {
// target for the immediate next period of time, as a sample of how target RPSes settle over time
targetRPS [3]float64 // rps that should be allowed by the limiter hosts at the end of the last round
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain to me more about [3] floats—what each one means and how to read it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a struct with named fields to make it easier to understand scenarios.

Comment on lines +133 to +136
// find out how many would've been accepted and track it (uses previous-round target-RPS because limiters always lag by an update cycle)
track(&results.accepts[0], &results.rejects[0], r[0], results.targetRPS[0])
track(&results.accepts[1], &results.rejects[1], r[1], results.targetRPS[1])
track(&results.accepts[2], &results.rejects[2], r[2], results.targetRPS[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time-mocked should be a better option since, on CI, time is very relative, and it can take more time to settle RPS, I think.

Comment on lines +168 to +187
t.Run("zeros", func(t *testing.T) {
// zeros are the worst-case scenario here, greatly favoring the non-zero values.
check(t, data{
targetRPS: [3]float64{0.63, 3.57, 10.7}, // intermittent allows far too few, constant is allowed over 2x what it uses
callRPS: [3]float64{1.25, 2.50, 5.00}, // all at or below per-host limit at all times, not just in aggregate
accepts: [3]float64{0.00, 1.50, 5.00}, // this is the main problems with this algorithm: bad behavior when imbalanced but low usage.
rejects: [3]float64{1.25, 1.00, 0.00},
cumulativeRPS: targetRPS, // but this is useful and simple: it always tries to enforce the "real" target, never over or under.
}, run(t, roundSchedule(0, 5)))
})
t.Run("low non-zeros", func(t *testing.T) {
// even 1 instead of 0 does quite a lot better, though still not great
check(t, data{
targetRPS: [3]float64{2.21, 4.07, 8.72},
callRPS: [3]float64{2.00, 3.00, 5.00},
accepts: [3]float64{1.25, 2.25, 5.00}, // constant use behaves well, others are not great
rejects: [3]float64{0.75, 0.75, 0.00},
cumulativeRPS: targetRPS,
}, run(t, roundSchedule(1, 5)))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of understood the problem from the description, but I cannot understand it from the test description.

Comment on lines +421 to +439
t.Run("zeros", func(t *testing.T) {
check(t, data{
targetRPS: [3]float64{3.38, 5.00, 10.7},
callRPS: [3]float64{1.25, 2.50, 5.00},
accepts: [3]float64{0.75, 2.50, 5.00}, // very close, a main motivating reason for this approach
rejects: [3]float64{0.50, 0.00, 0.00},
cumulativeRPS: targetRPS + 4.1, // when very low, allows moderate bit above
}, run(t, roundSchedule(0, 5)))
})
t.Run("low nonzeros", func(t *testing.T) {

check(t, data{
targetRPS: [3]float64{4.34, 5.00, 8.72},
callRPS: [3]float64{2.00, 3.00, 5.00},
accepts: [3]float64{1.75, 3.00, 5.00}, // almost perfect, a main motivating reason for this approach
rejects: [3]float64{0.25, 0.00, 0.00},
cumulativeRPS: targetRPS + 3.1, // less overage but still moderate
}, run(t, roundSchedule(1, 5)))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how a test scenario is a solution?

@Groxx
Copy link
Member Author

Groxx commented Aug 20, 2024

gonna pause on this one - it's a lot of work to maintain and tweak, and I've already gotten what I need out of it: confidence that a planned change is reasonable and behaves reasonably well.

@Groxx Groxx closed this Aug 20, 2024
Groxx added a commit that referenced this pull request Aug 27, 2024
Part 1 of 2 to address some relatively-expected "infrequent usage behaves sub-optimally" issues.

In this one, I'm tackling an issue where there are some high-usage limiting-hosts, and some occasional-and-low-usage hosts, and e.g. less than half of the target RPS is being used at any time.

In ^ this scenario it's possible for the low-usage hosts to be given a weight so low that they block _most_ of the requests they receive, especially if those requests come in small bursts (which is quite common) rather than evenly distributed.  This is partly because their weight is low enough to only allow a couple RPS, and partly because our limiters currently set their burst size == limit, so they also do not allow bursts that would otherwise be within-budget.

This _cannot_ be addressed perfectly with an async system like this (we can't perfectly predict caller behavior), but there are some easy partial improvements worth building:

1. (this PR) Allow low-weight hosts to over-commit and accept more than their fair share of RPS, if there is free RPS in the cluster.
   - For _consistently_ low-RPS hosts, this should be fair and a moderate improvement, and it performs no worse than a non-global algorithm if the limit is not usually fully used.
2. Raise the burst size, so intermittent callers can accumulate a small burst budget.
   - Currently this matches the RPS (min 1), which means e.g. a totally innocuous 2-request-event every 10 seconds will likely reject 50%.  Obviously that's not great.
   - "Fair" values here are fuzzier, and will be figured out in a later PR.

In this PR, I "boost" RPSes that are below the fallback values, in a way that converges towards the target RPS as usage increases.  This allows some over-committing, but we already allow that, and the fallback values are assumed to be "safe enough": they'd be used if the caller just waited a bit longer between calls -> the global limiter was garbage-collected.

If global usage increases, the boost-amount will shrink, and the over-commit will shrink as well.  So while this allows spiky requests through at a somewhat higher rate than desired, heavy usage should still converge near the target RPS rather than e.g. 2x the target.

---

A lot of lines have changed, but this PR boils down to:
- change the IDL to return more data (which was already tracked) as a new type
  - this will break the current system, causing the global limiters to use their fallback
  - ^ this is currently intentional, to serve as a test of that fallback, but should not generally be done.  normally it should handle both types, or shift to a plugin system and just have two stacks in use, switched by type.
- change the `algorithm` portion to return a struct rather than 2 values, to make this additional data easier to pass around
- boost RPS when updating the ratelimiters, in the `collection`
- a lot of minor changes to handle the new type, without changing behavior

---

Visually, for a hopefully more intuitive grasp of the issue, this is how weights/RPS currently get distributed among a number of hosts:
![ratelimiter - orig](https://github.com/user-attachments/assets/0956a507-cd22-4874-843e-7d987d1a32b9)

Low weights get a low ratio of the allowed RPS, so they are also only allowed to _grow_ a little, while high-weight hosts are allowed to grow a lot.  If we assume traffic arrives at frontends roughly randomly, this is fine (if they grow, they will all grow by the same %), but in practice we see a lot of "constant high use" to some hosts, and "occasional low use" to others, e.g. due to different services performing different call patterns and being routed differently.

Mathematically this makes sense... but it's relatively easy for something that only sends a couple requests every 10 seconds or so to eventually be allocated ~1 RPS because its weight is legitimately that low.  But then nearly all of those low-RPS requests are rejected despite there (possibly) being more than enough free-RPS in the cluster.

This PR essentially changes the behavior to look like this:
![ratelimiter - changed](https://github.com/user-attachments/assets/0193bfdf-687c-4b6e-805b-78d411c65a6f)

If there is unused-RPS in the cluster, low-weight hosts are allowed to increase their RPS up to their fallback-behavior's limit.  This is limited by the amount of unused-RPS, so as usage increases it will move towards the un-boosted limit (i.e. weight only), and the overall RPS should converge at the target RPS.

Under high usage, it looks more like this (only a small boost added to low weight hosts):
![ratelimiter - changed high usage](https://github.com/user-attachments/assets/d005541f-4f82-4344-924f-a61159805481)

_How_ the unused RPS is best-distributed is... fuzzy.  Here I've taken the easy and low-data-needed option: just use it all, independently, in each low-weight host.  This is more likely to prevent unfairly-blocked requests, but allows more requests in a multi-host burst (until re-balanced).  
Another option considered (and originally planned) was to distribute it between all *contributing* low-weight hosts, but this needs to either exchange more data or put target-RPS knowledge into both the limiter and the aggregator side of this system.  I think the latter will eventually happen and it's basically fine if/when it does, but it's a larger change than I wanted to do right now, and the former just didn't feel worth the extra logical complexity.  Simpler is better with this kind of system, because it's more understandable and predictable.

The original idea has a simulated implementation here, with a lot of tests to run a simulated traffic pattern and log how weights / individual hosts / etc behave: #6196
but that will likely not be merged - that PR served its purpose of showing whether or not my intuition about the system's behavior as a whole was correct, and the large chunk of code to do that is probably not worth maintaining.
jakobht pushed a commit to jakobht/cadence that referenced this pull request Sep 3, 2024
…orkflow#6238)

Part 1 of 2 to address some relatively-expected "infrequent usage behaves sub-optimally" issues.

In this one, I'm tackling an issue where there are some high-usage limiting-hosts, and some occasional-and-low-usage hosts, and e.g. less than half of the target RPS is being used at any time.

In ^ this scenario it's possible for the low-usage hosts to be given a weight so low that they block _most_ of the requests they receive, especially if those requests come in small bursts (which is quite common) rather than evenly distributed.  This is partly because their weight is low enough to only allow a couple RPS, and partly because our limiters currently set their burst size == limit, so they also do not allow bursts that would otherwise be within-budget.

This _cannot_ be addressed perfectly with an async system like this (we can't perfectly predict caller behavior), but there are some easy partial improvements worth building:

1. (this PR) Allow low-weight hosts to over-commit and accept more than their fair share of RPS, if there is free RPS in the cluster.
   - For _consistently_ low-RPS hosts, this should be fair and a moderate improvement, and it performs no worse than a non-global algorithm if the limit is not usually fully used.
2. Raise the burst size, so intermittent callers can accumulate a small burst budget.
   - Currently this matches the RPS (min 1), which means e.g. a totally innocuous 2-request-event every 10 seconds will likely reject 50%.  Obviously that's not great.
   - "Fair" values here are fuzzier, and will be figured out in a later PR.

In this PR, I "boost" RPSes that are below the fallback values, in a way that converges towards the target RPS as usage increases.  This allows some over-committing, but we already allow that, and the fallback values are assumed to be "safe enough": they'd be used if the caller just waited a bit longer between calls -> the global limiter was garbage-collected.

If global usage increases, the boost-amount will shrink, and the over-commit will shrink as well.  So while this allows spiky requests through at a somewhat higher rate than desired, heavy usage should still converge near the target RPS rather than e.g. 2x the target.

---

A lot of lines have changed, but this PR boils down to:
- change the IDL to return more data (which was already tracked) as a new type
  - this will break the current system, causing the global limiters to use their fallback
  - ^ this is currently intentional, to serve as a test of that fallback, but should not generally be done.  normally it should handle both types, or shift to a plugin system and just have two stacks in use, switched by type.
- change the `algorithm` portion to return a struct rather than 2 values, to make this additional data easier to pass around
- boost RPS when updating the ratelimiters, in the `collection`
- a lot of minor changes to handle the new type, without changing behavior

---

Visually, for a hopefully more intuitive grasp of the issue, this is how weights/RPS currently get distributed among a number of hosts:
![ratelimiter - orig](https://github.com/user-attachments/assets/0956a507-cd22-4874-843e-7d987d1a32b9)

Low weights get a low ratio of the allowed RPS, so they are also only allowed to _grow_ a little, while high-weight hosts are allowed to grow a lot.  If we assume traffic arrives at frontends roughly randomly, this is fine (if they grow, they will all grow by the same %), but in practice we see a lot of "constant high use" to some hosts, and "occasional low use" to others, e.g. due to different services performing different call patterns and being routed differently.

Mathematically this makes sense... but it's relatively easy for something that only sends a couple requests every 10 seconds or so to eventually be allocated ~1 RPS because its weight is legitimately that low.  But then nearly all of those low-RPS requests are rejected despite there (possibly) being more than enough free-RPS in the cluster.

This PR essentially changes the behavior to look like this:
![ratelimiter - changed](https://github.com/user-attachments/assets/0193bfdf-687c-4b6e-805b-78d411c65a6f)

If there is unused-RPS in the cluster, low-weight hosts are allowed to increase their RPS up to their fallback-behavior's limit.  This is limited by the amount of unused-RPS, so as usage increases it will move towards the un-boosted limit (i.e. weight only), and the overall RPS should converge at the target RPS.

Under high usage, it looks more like this (only a small boost added to low weight hosts):
![ratelimiter - changed high usage](https://github.com/user-attachments/assets/d005541f-4f82-4344-924f-a61159805481)

_How_ the unused RPS is best-distributed is... fuzzy.  Here I've taken the easy and low-data-needed option: just use it all, independently, in each low-weight host.  This is more likely to prevent unfairly-blocked requests, but allows more requests in a multi-host burst (until re-balanced).
Another option considered (and originally planned) was to distribute it between all *contributing* low-weight hosts, but this needs to either exchange more data or put target-RPS knowledge into both the limiter and the aggregator side of this system.  I think the latter will eventually happen and it's basically fine if/when it does, but it's a larger change than I wanted to do right now, and the former just didn't feel worth the extra logical complexity.  Simpler is better with this kind of system, because it's more understandable and predictable.

The original idea has a simulated implementation here, with a lot of tests to run a simulated traffic pattern and log how weights / individual hosts / etc behave: cadence-workflow#6196
but that will likely not be merged - that PR served its purpose of showing whether or not my intuition about the system's behavior as a whole was correct, and the large chunk of code to do that is probably not worth maintaining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants