Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardization of weights for load-balancing #4198

Closed
pierresouchay opened this issue Jun 6, 2018 · 4 comments
Closed

Standardization of weights for load-balancing #4198

pierresouchay opened this issue Jun 6, 2018 · 4 comments
Labels
type/enhancement Proposed improvement or new feature
Milestone

Comments

@pierresouchay
Copy link
Contributor

Feature Description

When using Consul to discover backends for reverse proxies, there is now way to guess the weight of a given backends.

Since backends are running on different nodes, I would be nice to standardize in Consul how much calls are redirected to a given instance, thus all proxies and libraries might be using the same rules. Historically, some proxies / libraries are using tags, some other start using Meta in instances, but all of this is a mess.

Some proxies are using DNS SRV (eg: HaProxy), some other HTTP APIs, it would be nice to have the same data for both.

I propose the following:

  1. Add metadata in Services with specific semantics regarding Load-Balancing
  2. Return the data in HTTP and SRV DNS interfaces
  3. Allow using the warning status in DNS response to modify the weight dynamically (thus it would allow services to heavily loaded to receive less traffic, for instance when they receive too many requests and return warning state)

Implementation proposal

When defining a service, use the following optional service Meta:

{
  "Meta": {
    "_dns_passing_prio": "1",          # default value = 1
    "_dns_passing_weight": "10",   # default value = 1
    "_dns_warning_prio": "1".         # default value = 1
    "_dns_warning_weight": "1"     # default value = 0
  }
}

When DNS return SRV requests, for each SRV record, apply the corresponding _dns_<state>_weight and _dns_<state>_prio to fill DNS SRV records

This could be very easily implemented and would simplify a lot interoperability of Load balancing systems.

Optionally, we might also use similar meta data at node level (defined in the agent) in order to multiply those weights based on node characteristics (thus services would not have to know on what kind of large/tiny instance they are running) - but it could be a improvement for another PR.

What do you think?

I am ready to implement this if Hashicorp considers including it.

@bedis
Copy link

bedis commented Jun 8, 2018

Hi,

As HAProxy's DNS stack maintainer, I'd be glad to see such feature in Consul.
Furthermore, HAProxy is already compatible with DNS SRV record's weight.

Note that HAProxy does not follow-up SRV record "prio" for now. We have some use cases in mind for this field, but are not yet sure how to implement them.

Baptiste

@pearkes
Copy link
Contributor

pearkes commented Jun 8, 2018

Hey! We definitely like this idea, agree with all your points on the benefits. A couple notes.

  • Give it a dedicated configuration key instead of using meta. We currently use meta with some consul- prefixed items for various internal features, but users don't interact with those during service registration. For a dedicated key we recommend something like this:
{
  "Priority": {
    "passing": 1,
    "warning": 1
  },
  "Weight": {
     "passing": 10,
     "warning": 1
   }
}
  • Start with just modifying services for this behavior and possibly plan for the node characteristics feature in the future.

Let us know how we can help!

@iksaif
Copy link

iksaif commented Jun 9, 2018

cc: @wdauchy

@pearkes pearkes added the type/enhancement Proposed improvement or new feature label Jul 26, 2018
pierresouchay added a commit to pierresouchay/consul that referenced this issue Jul 31, 2018
Adding this datastructure will allow us to resolve the
issues hashicorp#1088 and hashicorp#4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.
@pierresouchay
Copy link
Contributor Author

Let us know how we can help!

@pearkes Sure, I made a first implementation in #4468, and a review would help.

I kept the exact plan you described in your last comment - but I did not implement priority since its semantics are clear for no-one :-)

My first implementation only update weights in SRV records, but does not apply the weights in DNS A/AAAA queries. But this can be implemented later. The patch is big enough for now and I'll implement A/AAAA DNS in another review.

@banks Nice to see that our discussion allowed me to plug nicely with new Connect features as well since Weights are already implemented in it!

See #4468 (comment) for an example of input and DNS SRV output

banks pushed a commit that referenced this issue Sep 7, 2018
* Implementation of Weights Data structures

Adding this datastructure will allow us to resolve the
issues #1088 and #4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.

* Implemented weights for DNS SRV Records

* DNS properly support agents with weight support while server does not (backwards compatibility)

* Use Warning value of Weights of 1 by default

When using DNS interface with only_passing = false, all nodes
with non-Critical healthcheck used to have a weight value of 1.
While having weight.Warning = 0 as default value, this is probably
a bad idea as it breaks ascending compatibility.

Thus, we put a default value of 1 to be consistent with existing behaviour.

* Added documentation for new weight field in service description

* Better documentation about weights as suggested by @banks

* Return weight = 1 for unknown Check states as suggested by @banks

* Fixed typo (of -> or) in error message as requested by @mkeeler

* Fixed unstable unit test TestRetryJoin

* Fixed unstable tests

* Fixed wrong Fatalf format in `testrpc/wait.go`

* Added notes regarding DNS SRV lookup limitations regarding number of instances

* Documentation fixes and clarification regarding SRV records with weights as requested by @banks

* Rephrase docs
@banks banks closed this as completed Sep 7, 2018
@banks banks added this to the 1.2.3 milestone Sep 7, 2018
pierresouchay added a commit to criteo/consul-templaterb that referenced this issue Sep 7, 2018
References:
* hashicorp/consul#4468
* hashicorp/consul#4198

Change-Id: I808fc90f3dd87b43bcf6325edb9cab8bbadffe60
ShimmerGlass pushed a commit to ShimmerGlass/consul-client that referenced this issue Sep 21, 2018
Consul recentlty added support for service weights. (see
hashicorp/consul#4198).
This change adds the new field to the CatalogService class.
ShimmerGlass pushed a commit to ShimmerGlass/consul-client that referenced this issue Sep 21, 2018
Consul recentlty added support for service weights. (see
hashicorp/consul#4198).
This change adds the new field to the Service and CatalogService
classes.
rickfast pushed a commit to rickfast/consul-client that referenced this issue Sep 21, 2018
* Add Weights property to Service and CatalogService

Consul recentlty added support for service weights. (see
hashicorp/consul#4198).
This change adds the new field to the Service and CatalogService
classes.

* Use latest (1.2.3) consul version for travis
ShimmerGlass pushed a commit to ShimmerGlass/finagle-consul that referenced this issue Sep 25, 2018
With the latest consul version services can have weights (see
hashicorp/consul#4198).
This change takes them into account when retreiving the endpoint
list in the client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

5 participants