upstream: desynchronize WRR picks. #3271

htuch · 2018-05-02T16:06:45Z

This PR fixes two issues:

Avoid having synchronized behavior across the fleet by having a seed
applied to each WRR balancer instance.
Avoid having unweighted RR reset to same host on each refresh, biasing towards
earlier hosts in the schedule. This TODO still exists for weighted.

Risk Level: Medium (there will be observable traffic effects due to
changes in host pick schedule).
Testing: New unit tests, modified existing.

Signed-off-by: Harvey Tuch htuch@google.com

This PR fixes two issues: * Avoid having synchronized behavior across the fleet by having a seed applied to each WRR balancer instance. * Avoid having WRR reset to same host on each refresh, biasing towards earlier hosts in the schedule. Risk Level: Medium (there will be observable traffic effects due to changes in host pick schedule). Testing: New unit tests, modified existing. Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-05-02T16:07:03Z

@mattklein123 @danielhochman @alyssawilk

Signed-off-by: Harvey Tuch <htuch@google.com>

alyssawilk

Thanks for tackling this one!

alyssawilk · 2018-05-02T19:30:05Z

source/common/upstream/load_balancer_impl.cc

+ hosts.size() == 0
+ ? 0
+ : (scheduler_.find(source) != scheduler_.end() ? scheduler_[source].rr_index_ : seed_) %
+ hosts.size();


can you break this down for readability? Also comments :-)

alyssawilk · 2018-05-02T19:42:06Z

source/common/upstream/load_balancer_impl.cc

+ scheduler.edf_.add(host->weight(), host);
+ }
+ // Cycle through hosts to achieve the intended offset behavior.
+ for (uint32_t i = 0; i < offset; ++i) {


Would it be sufficient to add the hosts starting at index N and wrapping, rather than adding in-order and doing N picks?

Sadly not, since with weighted, the initial insertion order doesn't matter, the weights govern where in the schedule you are.

Ok, having thought about this for more than 5 seconds in a meeting :-P Now I'm not convinced this is particularly fair anyway. To be truly fair wouldn't we have to loop through between 0-[each host * each weight]?

Check my math because I'm terrible at these. If you have A w=100, B w=100 C w=1 you'd start with
[A=.01, B=.01, C=1].
Assuming if our rng told us to max out our picks at 3, we'd pop A, assign it a weight of (1/100+1)/100 -> .0101 so now the queue is
B=.01 A=.0101, C=1 pop B, assign it whatever weight, and you'll still pop C as your third pick. A should have a 1 in 201 chance of getting picked and I think it deterministically still has a 0% chance of being picked, no?

Yes, I came to the same conclusion and have already removed the offset for fairness. I'm still keeping it there to break synchronization and have left a TODO to make it fair.

alyssawilk · 2018-05-02T19:42:40Z

source/common/upstream/load_balancer_impl.cc

@@ -481,6 +488,9 @@ HostConstSharedPtr RoundRobinLoadBalancer::chooseHost(LoadBalancerContext*) {
 // We do not expire any hosts from the host list without rebuilding the scheduler.
 ASSERT(host != nullptr);
 scheduler.edf_.add(host->weight(), host);
+ // Track the nubmer of picks so we can pick up where we left of when


nubmer->number
of -> off

alyssawilk · 2018-05-02T19:45:53Z

source/common/upstream/load_balancer_impl.h

@@ -228,13 +228,16 @@ class RoundRobinLoadBalancer : public LoadBalancer, ZoneAwareLoadBalancerBase {
 struct Scheduler {
 // EdfScheduler for weighted RR.
 EdfScheduler<const Host> edf_;
- // Simple clock hand for when we do unweighted.
+ // Simple clock hand for when we do unweighted. Tracks the total number of
+ // picks when weighted.


does not not functionally track the number of picks either way?

alyssawilk · 2018-05-02T19:47:22Z

source/common/upstream/load_balancer_impl.h

 size_t rr_index_{};
 bool weighted_{};
 };

 // Scheduler for each valid HostsSource.
 std::unordered_map<HostsSource, Scheduler, HostsSourceHash> scheduler_;
+ // Seed to allow us to desynchronize WRR balancers across a fleet.


Mind adding a few sentences about this here, or other place of your choice? I think it would be helpful if this were spelled out more for folks who haven't seen a tech talk (or a postmortem) of what happens in prod recently :-)

danielhochman · 2018-05-02T19:52:05Z

source/common/upstream/load_balancer_impl.cc

+ const uint64_t offset =
+ hosts.size() == 0
+ ? 0
+ : (scheduler_.find(source) != scheduler_.end() ? scheduler_[source].rr_index_ : seed_) %


when not using weights, is there a reason that we don't always use a random offset?

Yeah; if you use the last offset we had for a locality, we have the nice property that if we update twice with the same host set, we continue the RR where we left of. This avoids biasing towards the earlier hosts in the schedule.

danielhochman · 2018-05-02T19:52:51Z

source/common/upstream/load_balancer_impl.cc

@@ -481,6 +488,9 @@ HostConstSharedPtr RoundRobinLoadBalancer::chooseHost(LoadBalancerContext*) {
 // We do not expire any hosts from the host list without rebuilding the scheduler.
 ASSERT(host != nullptr);
 scheduler.edf_.add(host->weight(), host);
+ // Track the nubmer of picks so we can pick up where we left of when


s/nubmer/number/ and s/left of/left off/

Signed-off-by: Harvey Tuch <htuch@google.com>

…ht for now. Signed-off-by: Harvey Tuch <htuch@google.com>

mattklein123

LGTM, thanks for jumping on this.

mattklein123 · 2018-05-03T07:45:06Z

source/common/upstream/load_balancer_impl.cc

+ // Cycle through hosts to achieve the intended offset behavior.
+ // TODO(htuch): Consider how we can avoid biasing towards earlier hosts in the
+ // schedule across refreshes for the weighted case.
+ for (uint32_t i = 0; i < seed_ % hosts.size(); ++i) {


A lot of this behavior in the EDF case is not intuitive to me at all. Not a big deal right now but is it worth at some point writing some LB simulations to cover this case so we can better understand the behavior? I wonder for example if we really have to cycle through all hosts to get reasonable randomness since doing that might be costly for large host sets?

I agree, EDF is pretty confusing and I think we might want to move away from it longer term. I've filed #3278 and will merge as is.

This PR fixes two issues: Avoid having synchronized behavior across the fleet by having a seed applied to each WRR balancer instance. Avoid having unweighted RR reset to same host on each refresh, biasing towards earlier hosts in the schedule. This TODO still exists for weighted. Risk Level: Medium (there will be observable traffic effects due to changes in host pick schedule). Testing: New unit tests, modified existing. Signed-off-by: Harvey Tuch <htuch@google.com> Signed-off-by: Rama <rama.rao@salesforce.com>

Fix tests.

be96b35

Signed-off-by: Harvey Tuch <htuch@google.com>

alyssawilk reviewed May 2, 2018

View reviewed changes

danielhochman reviewed May 2, 2018

View reviewed changes

Comment updates.

1cd1777

Signed-off-by: Harvey Tuch <htuch@google.com>

ggreenway assigned alyssawilk May 2, 2018

Remove resumption for weighted, it's wrong and complicated to get rig…

d0fc519

…ht for now. Signed-off-by: Harvey Tuch <htuch@google.com>

mattklein123 approved these changes May 3, 2018

View reviewed changes

htuch merged commit 872728d into envoyproxy:master May 3, 2018

htuch deleted the random-wrr branch May 3, 2018 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upstream: desynchronize WRR picks. #3271

upstream: desynchronize WRR picks. #3271

htuch commented May 2, 2018 •

edited

Loading

htuch commented May 2, 2018

alyssawilk left a comment

alyssawilk May 2, 2018

alyssawilk May 2, 2018

htuch May 2, 2018

alyssawilk May 2, 2018 •

edited

Loading

htuch May 2, 2018

alyssawilk May 2, 2018

alyssawilk May 2, 2018

alyssawilk May 2, 2018

danielhochman May 2, 2018

htuch May 2, 2018

danielhochman May 2, 2018

mattklein123 left a comment

mattklein123 May 3, 2018

htuch May 3, 2018

upstream: desynchronize WRR picks. #3271

upstream: desynchronize WRR picks. #3271

Conversation

htuch commented May 2, 2018 • edited Loading

htuch commented May 2, 2018

alyssawilk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk May 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch commented May 2, 2018 •

edited

Loading

alyssawilk May 2, 2018 •

edited

Loading