Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream: desynchronize WRR picks. #3271

Merged
merged 4 commits into from
May 3, 2018
Merged

Conversation

htuch
Copy link
Member

@htuch htuch commented May 2, 2018

This PR fixes two issues:

  • Avoid having synchronized behavior across the fleet by having a seed
    applied to each WRR balancer instance.
  • Avoid having unweighted RR reset to same host on each refresh, biasing towards
    earlier hosts in the schedule. This TODO still exists for weighted.

Risk Level: Medium (there will be observable traffic effects due to
changes in host pick schedule).
Testing: New unit tests, modified existing.

Signed-off-by: Harvey Tuch htuch@google.com

This PR fixes two issues:

* Avoid having synchronized behavior across the fleet by having a seed
  applied to each WRR balancer instance.
* Avoid having WRR reset to same host on each refresh, biasing towards
  earlier hosts in the schedule.

Risk Level: Medium (there will be observable traffic effects due to
  changes in host pick schedule).
Testing: New unit tests, modified existing.

Signed-off-by: Harvey Tuch <htuch@google.com>
@htuch
Copy link
Member Author

htuch commented May 2, 2018

Signed-off-by: Harvey Tuch <htuch@google.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this one!

hosts.size() == 0
? 0
: (scheduler_.find(source) != scheduler_.end() ? scheduler_[source].rr_index_ : seed_) %
hosts.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you break this down for readability? Also comments :-)

scheduler.edf_.add(host->weight(), host);
}
// Cycle through hosts to achieve the intended offset behavior.
for (uint32_t i = 0; i < offset; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to add the hosts starting at index N and wrapping, rather than adding in-order and doing N picks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly not, since with weighted, the initial insertion order doesn't matter, the weights govern where in the schedule you are.

Copy link
Contributor

@alyssawilk alyssawilk May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, having thought about this for more than 5 seconds in a meeting :-P Now I'm not convinced this is particularly fair anyway. To be truly fair wouldn't we have to loop through between 0-[each host * each weight]?

Check my math because I'm terrible at these. If you have A w=100, B w=100 C w=1 you'd start with
[A=.01, B=.01, C=1].
Assuming if our rng told us to max out our picks at 3, we'd pop A, assign it a weight of (1/100+1)/100 -> .0101 so now the queue is
B=.01 A=.0101, C=1 pop B, assign it whatever weight, and you'll still pop C as your third pick. A should have a 1 in 201 chance of getting picked and I think it deterministically still has a 0% chance of being picked, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I came to the same conclusion and have already removed the offset for fairness. I'm still keeping it there to break synchronization and have left a TODO to make it fair.

@@ -481,6 +488,9 @@ HostConstSharedPtr RoundRobinLoadBalancer::chooseHost(LoadBalancerContext*) {
// We do not expire any hosts from the host list without rebuilding the scheduler.
ASSERT(host != nullptr);
scheduler.edf_.add(host->weight(), host);
// Track the nubmer of picks so we can pick up where we left of when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nubmer->number
of -> off

@@ -228,13 +228,16 @@ class RoundRobinLoadBalancer : public LoadBalancer, ZoneAwareLoadBalancerBase {
struct Scheduler {
// EdfScheduler for weighted RR.
EdfScheduler<const Host> edf_;
// Simple clock hand for when we do unweighted.
// Simple clock hand for when we do unweighted. Tracks the total number of
// picks when weighted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not not functionally track the number of picks either way?

size_t rr_index_{};
bool weighted_{};
};

// Scheduler for each valid HostsSource.
std::unordered_map<HostsSource, Scheduler, HostsSourceHash> scheduler_;
// Seed to allow us to desynchronize WRR balancers across a fleet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding a few sentences about this here, or other place of your choice? I think it would be helpful if this were spelled out more for folks who haven't seen a tech talk (or a postmortem) of what happens in prod recently :-)

const uint64_t offset =
hosts.size() == 0
? 0
: (scheduler_.find(source) != scheduler_.end() ? scheduler_[source].rr_index_ : seed_) %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when not using weights, is there a reason that we don't always use a random offset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; if you use the last offset we had for a locality, we have the nice property that if we update twice with the same host set, we continue the RR where we left of. This avoids biasing towards the earlier hosts in the schedule.

@@ -481,6 +488,9 @@ HostConstSharedPtr RoundRobinLoadBalancer::chooseHost(LoadBalancerContext*) {
// We do not expire any hosts from the host list without rebuilding the scheduler.
ASSERT(host != nullptr);
scheduler.edf_.add(host->weight(), host);
// Track the nubmer of picks so we can pick up where we left of when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/nubmer/number/ and s/left of/left off/

Signed-off-by: Harvey Tuch <htuch@google.com>
…ht for now.

Signed-off-by: Harvey Tuch <htuch@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for jumping on this.

// Cycle through hosts to achieve the intended offset behavior.
// TODO(htuch): Consider how we can avoid biasing towards earlier hosts in the
// schedule across refreshes for the weighted case.
for (uint32_t i = 0; i < seed_ % hosts.size(); ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this behavior in the EDF case is not intuitive to me at all. Not a big deal right now but is it worth at some point writing some LB simulations to cover this case so we can better understand the behavior? I wonder for example if we really have to cycle through all hosts to get reasonable randomness since doing that might be costly for large host sets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, EDF is pretty confusing and I think we might want to move away from it longer term. I've filed #3278 and will merge as is.

@htuch htuch merged commit 872728d into envoyproxy:master May 3, 2018
@htuch htuch deleted the random-wrr branch May 3, 2018 14:15
ramaraochavali pushed a commit to ramaraochavali/envoy that referenced this pull request May 3, 2018
This PR fixes two issues:

Avoid having synchronized behavior across the fleet by having a seed
applied to each WRR balancer instance.
Avoid having unweighted RR reset to same host on each refresh, biasing towards
earlier hosts in the schedule. This TODO still exists for weighted.
Risk Level: Medium (there will be observable traffic effects due to
changes in host pick schedule).
Testing: New unit tests, modified existing.


Signed-off-by: Harvey Tuch <htuch@google.com>

Signed-off-by: Rama <rama.rao@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants