Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS round-robin based on SRV weight/priority #1088

Open
ask0n opened this issue Jul 7, 2015 · 40 comments
Open

DNS round-robin based on SRV weight/priority #1088

ask0n opened this issue Jul 7, 2015 · 40 comments
Labels
theme/dns Using Consul as a DNS provider, DNS related issues theme/service-metadata Anything related to management/tracking of service metadata type/enhancement Proposed improvement or new feature

Comments

@ask0n
Copy link

ask0n commented Jul 7, 2015

Since consul already have SRV records support, is it possible to implement following RR scheme:

  1. Check script change SRV priority/weight for node with consul client
  2. Consul agent use SRV weight for RR response for .service. requests. For example when one of nodes have LA> 10, we set it priority as 1, other node has default priority 0, now RR will return A record for node with priority 0 75%, for priority 1 25%. After LA becomes lower then 1, we set priority=0 for node and RR return 50% for each of two nodes.
@ryanuber
Copy link
Member

ryanuber commented Jul 7, 2015

This is an interesting idea, and is definitely possible. If you want to start a design document to sketch out the idea and how it would work more formally, that could be helpful.

@ryanuber ryanuber added the type/enhancement Proposed improvement or new feature label Jul 7, 2015
@ask0n
Copy link
Author

ask0n commented Jul 8, 2015

I think we may use rfc2782 as start point for implementation:

Priority

The priority of this target host. A client MUST attempt to contact the target host with the lowest-numbered priority it can reach; target hosts with the same priority SHOULD be tried in an order defined by the weight field. The range is 0-65535. This is a 16 bit unsigned integer in network byte order.

Weight

A server selection mechanism. The weight field specifies a relative weight for entries with the same priority. Larger weights SHOULD be given a proportionately higher probability of being selected. The range of this number is 0-65535. This is a 16 bit unsigned integer in network byte order. Domain administrators SHOULD use Weight 0 when there isn't any server selection to do, to make the RR easier to read for humans (less noisy). In the presence of records containing weights greater than 0, records with weight 0 should have a very small chance of being selected.

In the absence of a protocol whose specification calls for the use of other weighting information, a client arranges the SRV RRs of the same Priority in the order in which target hosts, specified by the SRV RRs, will be contacted. The following algorithm SHOULD be used to order the SRV RRs of the same priority:

To select a target to be contacted next, arrange all SRV RRs (that have not been ordered yet) in any order, except that all those with weight 0 are placed at the beginning of the list.

Compute the sum of the weights of those RRs, and with each RR associate the running sum in the selected order. Then choose a uniform random number between 0 and the sum computed (inclusive), and select the RR whose running sum value is the first in the selected order which is greater than or equal to the random number selected. The target host specified in the selected SRV RR is the next one to be contacted by the client. Remove this SRV RR from the set of the unordered SRV RRs and apply the described algorithm to the unordered SRV RRs to select the next target host. Continue the ordering process until there are no unordered SRV RRs. This process is repeated for each Priority.

Default priority value is 0. I think it would be useful have separate config option for it, something like "srv_priority" in "dns_config" section. Cool feature of srv_priority will be an ability to use it as some kind of grouping. Let say we need to have some group of servers to test new feature, we set srv_priority=1 for them. Now, when server with srv_priority ask consul about NS records for .service. group and this group has some servers with same srv_priority set, consul will use only servers with same srv_priority set in DNS reply. If there is no servers with same srv_priority, or srv_priority isn't set, consul will use all available servers in DNS reply for .service. request.

For Weight value it would be nice to have additional check definition, because this check isn't indicate that node live or dead. So it runs in same way as script check, but returned value is used as weight definition for SRV record, it will have more then 0,1,">1" values.

Now, when we have all values in one place we can use described in rfc algorithm to calculate order. We need to have an ability to set both more and less preferable servers by weight, then servers with default weight value. Lets say default will be 100 (rfc told us about 0, but in this case we can't reduce this value), so if we want to reduce selected server weight compared to other servers with default value it weight will be almost % value of chance to be contacted.
If we set weight 0, it mean server should never be contacted until it weight will be changed (UPDATE or it is only one server in .service. group). It alive, but maybe too busy for new connections.
If we need more preferable server, we will use values >100 and <1000 lets say. If server value is 1000 - it's mean that it will always win, and it will get all traffic for .service. group. If there is more then one server with weight 1000 we use round-roubin algorithm only for this servers, and don't look to servers with less weight.

@eloycoto
Copy link

Hi,

I tested today DNS SRV and some phone integrations. Weight + Priority it's needed for VoIP world.

I'm using Consul to discover some Voice servers, and it's awesome, but in the VoIP world we need to send the calls always to the same "endpoint". I mean, I'm using this for a voice conference, so the first caller join the conference in server A, but the second caller should be in the same server. So we're using DNS-SRV for HA purposes.

Our DNS-SRV manually looks like this:

_sip._udp.sip.x.acalustra.com.  299 IN  SRV 10 1 5060 192.168.50.11
_sip._udp.sip.x.acalustra.com.  299 IN  SRV 20 1 5060 192.168.50.10

Could we add here the priority based on when the node join the service? Does it work for you? Do you have any other approach?

I can spend time on this, it's a priority to get this project done, so if you point me in the right direction I can start to code ;-)

Regards
Eloy

@armon
Copy link
Member

armon commented Jul 22, 2015

@eloycoto I think unfortunately to do this the right way, we need to support arbitrary K/V attributes on nodes and services. This lets us much more cleanly support something like "dns_weight=2" and then have that parsed and respected. Anything built on the existing API would be a huge hack like "dns-weight-2" tag, which I'd oppose.

So unfortunately, I think to do it right requires a lot more rethinking of things outside the scope of this one feature. We want to get there, but it will take a little more time for us to firm up the foundation.

@eloycoto
Copy link

Hi @armon,

Make sense, it's a big change. Ping me if you need help I can spend time on this + QA time.

Regards

@Esya
Copy link

Esya commented Oct 7, 2015

@eloycoto Out of curiosity how did you do the _sip._udp part with Consul ? Did you call your service _udp and give it the tag _sip ?

@epcim
Copy link

epcim commented Jan 29, 2016

+1
not jus VoIP, just imagine you have a DB cluster where you want to have one node for writes (but what this one fails, the priority would primary elect the one designed for writes, but if that would be dead, then another would be selected.

Or is this use-case in scope of existing configuration? In example: keep the "master" tag only one one host; an algorithm to choose the proper one?; migrate the tag, as the current "master" fails?; instantly;

@derek-virtustream
Copy link

+1. I would like to use SRV based RR for XMPP.

@kunalvjti
Copy link

+1 for this. Is there any timeline on when this is going to come out ?

@eloycoto
Copy link

@epcim about HA environments, nowadays I'm doing like this https://www.youtube.com/watch?v=t3O5b2sweYs

Regards

@gfrankliu
Copy link

Consul's DNS SRV is a great idea. How is the weight and priority determined ? When a new node registers a service , how can it set the weight/priority? Can those be modified later on the consul server?

@slackpad
Copy link
Contributor

@gfrankliu weights are currently not supported. Right now Consul randomizes the results of DNS queries for load balancing, and removes nodes with failing health checks, but does not allow you to set the weights and priorities.

@gfrankliu
Copy link

@slackpad that's too bad. It will make the DNS SRV less useful. Is the support on the road map? I guess the workaround is not to use DNS but use http, and create tags to store "weight", etc. information.

@slackpad
Copy link
Contributor

Yes I'm not sure of the current timeframe but we'd like to add this. Will keep this issue updated!

@paradoxbound
Copy link

+1
@epcim I built a solution that uses Consul as part of a solution for DB failover for Postgresql. Postgresql is managed by Repmgr which will failover if the master becomes unresponsive, elect an new master and cause any additional slaves to follow the new master. Once this process is complete it triggers a call to Consul which updates any apps. If the old master comes back it doesn't trigger a consul update so you can avoid split brain. The approach is valid for other replication managers such as MHA Manager for MariaDB or MySQL.

@Ranger-X
Copy link

+1

@thehydroimpulse
Copy link

@slackpad any update on this? I'm guessing there hasn't been any progress in this area.

@d-balakin
Copy link

+1

6 similar comments
@yannispanousis
Copy link

+1

@hsw
Copy link

hsw commented Feb 16, 2017

+1

@shaunofneuron
Copy link

+1

@makeittotop
Copy link

+1

@fengyehong
Copy link

+1

@pierresouchay
Copy link
Contributor

+1

@slackpad slackpad added the theme/service-metadata Anything related to management/tracking of service metadata label May 2, 2017
@setaou
Copy link

setaou commented May 4, 2017

+1

@JakeDEvans
Copy link

+2

@majormoses
Copy link
Contributor

Please everyone stop using +x this emails everyone on the thread and does not provide any usefulness. To show that you need this as well please click the thumbs up (or down) attached to someones comment. This avoids emailing everyone and still allows the maintainers to track how many people are supportive (or not) of an issue.

@magiconair
Copy link
Contributor

@majormoses I've come to the conclusion that these kind of requests don't work. People will use what they want to register their interest and personally I don't care since it is the engagement that counts. I find it more important to see that there is still demand. Also, one difference is that we do get notified on tickets when someone adds +1 which might bump the ticket on our radar. Having said that, 👍 works just as well :)

@thetuxkeeper
Copy link

+1 - We would like to use it with our DB Cluster (1 master 3 slaves):

  • tag master on default master with high prio
  • tag master on slaves with low prio
  • tag slave on slaves with equal high prio for all of them
  • tag slave on master with low prio

@sabbene
Copy link

sabbene commented Feb 11, 2018

+1

pierresouchay added a commit to pierresouchay/consul that referenced this issue Jul 31, 2018
Adding this datastructure will allow us to resolve the
issues hashicorp#1088 and hashicorp#4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.
banks pushed a commit that referenced this issue Sep 7, 2018
* Implementation of Weights Data structures

Adding this datastructure will allow us to resolve the
issues #1088 and #4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.

* Implemented weights for DNS SRV Records

* DNS properly support agents with weight support while server does not (backwards compatibility)

* Use Warning value of Weights of 1 by default

When using DNS interface with only_passing = false, all nodes
with non-Critical healthcheck used to have a weight value of 1.
While having weight.Warning = 0 as default value, this is probably
a bad idea as it breaks ascending compatibility.

Thus, we put a default value of 1 to be consistent with existing behaviour.

* Added documentation for new weight field in service description

* Better documentation about weights as suggested by @banks

* Return weight = 1 for unknown Check states as suggested by @banks

* Fixed typo (of -> or) in error message as requested by @mkeeler

* Fixed unstable unit test TestRetryJoin

* Fixed unstable tests

* Fixed wrong Fatalf format in `testrpc/wait.go`

* Added notes regarding DNS SRV lookup limitations regarding number of instances

* Documentation fixes and clarification regarding SRV records with weights as requested by @banks

* Rephrase docs
@maxadamo
Copy link

maxadamo commented Mar 4, 2019

I was expecting to see weight as a dynamic paramter.
For instance, inside the consul script I check cpu usage. I get 98. I do 100-98 and I assign a weight of 2.
Can I do this, or am I watching too many sci-fi movies? 😺

p.s.: I understand that the script can produce a WARNING when, for instance, the CPU usage is too high, and I can give the warning a lower weight.

@pierresouchay
Copy link
Contributor

@maxadamo this is the intended behavior, you can do script that compute passing or warning state base on metrics. SRV records expose this as well as HTTP catalog. However, it is true that DNS A queries don't do it. But if your LB does respect SRV weights, it does work already

@banks
Copy link
Member

banks commented Mar 4, 2019

The caveat I'd say with that is that updating the weights requires Raft commits on the servers so you should be careful how often that can happen otherwise it could kill Consul servers with the load once things start to get busy and frequently update their weights.

For example if every application instance has a script that checks the CPU every 5 seconds and updates the weights, then you might be fine with 50 instances (10 writes/second) maybe even 500 (100 writes/second) but you very quickly get into Consul server scaling issues where you may never have seen before when you only made changes minutes our hours apart.

In general Consul does it's best to leave the server state unchanged as long as possible - that's why we only sync the output of a script check periodically (ever few minutes) rather than every time the output changes for example.

So dynamic is OK, but watch your update frequency to give your servers a chance!

@maxadamo
Copy link

maxadamo commented Mar 5, 2019

Thank you both and thanks for the comprehensive explanation.
Now I can quite say that my initial idea to have a number that keeps bouncing and flipping between 0 to 100 could not work, even if it was possible to send a weight number from the check script.
In my case I raise a warning when a particular "warning" condition is met, and I know that it won't happen often, and I know that I should pay attention.

The use case is quite interesting. Jruby of the puppet server can start eating all the resources and the agent which is picking the server with a CPU peak will be slow to run the agent. But, since puppet supports SRV, I can point the agents to the server which has less then 90% of CPU usage.
If it's 99% I can even set it to critical, until the server calms down 😌
If all the server will have 90%, they'll all be equally bad and will get the same low weight.
I like it!

@pierresouchay
Copy link
Contributor

@maxadamo On our side, what we are doing is setting up weights accordingly to performance of machine, ex:

  • machine tiny, weights: {Passing: 20, Warning: 3}
  • machine large, weights: {Passing: 40, Warning: 5}

Then, some scripts at node level checks for CPU usage, when CPU is above 95% for more than 5min, use Warning. The big advantage is that is load might be able to auto-regulate (instance too heavily loaded will receive less trafic and be able to recover much more quickly, and disruption might be lower (less requests will go to a saturated node, so more correct answers from service). Not using critical (means weight:=0) allow us to be sure that even in case of heavy load on all instances of a service, all instances will continue to try serving service, at worse, all instances are in Warning State (for instance, think to cases when DC is saturated by requests).

@maxadamo
Copy link

maxadamo commented Mar 5, 2019

@pierresouchay about 0 I have a doubt. I have seen the following example:
https://en.wikipedia.org/wiki/SRV_record#Provisioning_for_high_service_availability
where the backup is set to 0. We know that there isn't much control on wikipedia and everyone can write on it, but they wrote:
If all three servers with priority 10 are unavailable, the record with the next lowest priority value will be chosen, which is backupbox.example.com.
It means zero is not critical.

p.s.: I can't set it to critical. I wrote something wrong. If both servers go to critical the service will be unavailable.

@pierresouchay
Copy link
Contributor

Priority is not weight, consul let you set the weight, not the priority

@maxadamo
Copy link

maxadamo commented Mar 5, 2019

@pierresouchay I did copy paste without checking. You're right.
But are we sure that zero does not act as a backup? Otherwise, it doesn't make much sense to me to create a record that will not take to load over. Why we need a record that doesn't work and does exactly nothing? 🤔

@maxadamo
Copy link

maxadamo commented Mar 5, 2019

from RFC 2782:
In the presence of records containing weights greater than 0, records with weight 0 should have a very small chance of being selected.
Then it's not critical: it's a hot backup.

@pierresouchay
Copy link
Contributor

It really depends of implementation.

Weights be might be used by DNS SRV, but also by systems interacting directly with Consul HTTP API (this is what we do).

@jsosulska jsosulska added the theme/dns Using Consul as a DNS provider, DNS related issues label May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/dns Using Consul as a DNS provider, DNS related issues theme/service-metadata Anything related to management/tracking of service metadata type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests