Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Weights Data structures #4468

Merged
merged 17 commits into from
Sep 7, 2018

Conversation

pierresouchay
Copy link
Contributor

Adding this datastructure will allow us to resolve the
issues #1088 and #4198

This new structure defaults to values:

   { Passing: 1, Warning: 0 }

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.

@pierresouchay pierresouchay force-pushed the weights_for_services branch 3 times, most recently from 5c69eff to d1e17cd Compare July 31, 2018 16:24
Adding this datastructure will allow us to resolve the
issues hashicorp#1088 and hashicorp#4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.
@pierresouchay
Copy link
Contributor Author

@pearkes Ok, this first implementation is already working for SRV record.

I think this implementation is good enough.

In another PR later, I will add support for applying weights to A/AAAA DNS queries.

Testing

Register nodes

We register 1 old node without the new Weight attribute, which is equivalent to registering with "Weights": {"Passing": 1, "Warning": 0}

We register 1 large node with higher weight, 2 working nodes, 1 working node, 1 critical and 1 maintenance.

# Old way, regular node, without weights
curl -vfs --request PUT -d '{"Node": "noweight-host-redis-1-0", "Address": "192.168.12.1", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-1.test.acme.com", "start_20184431"], "Port": 8000, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "passing"}]}}' http://localhost:8500/v1/catalog/register


# Large node, weight 600
curl -vfs --request PUT -d '{"Node": "large-host-redis-1-0", "Address": "192.168.10.1", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-1.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 600, "Warning":30}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "passing"}]}}' http://localhost:8500/v1/catalog/register

# 2 working nodes
curl -vfs --request PUT -d '{"Node": "working-host-redis-1-1", "Address": "192.168.1.1", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-1.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 100, "Warning":7}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "passing"}]}}' http://localhost:8500/v1/catalog/register
curl -vfs --request PUT -d '{"Node": "working-host-redis-1-2", "Address": "192.168.1.2", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-2.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 100, "Warning":7}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "passing"}]}}' http://localhost:8500/v1/catalog/register

# 1 warning node
curl -vfs --request PUT -d '{"Node": "warning-host-redis-1-3", "Address": "192.168.1.3", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-3.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 100, "Warning":7}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "warning"}]}}' http://localhost:8500/v1/catalog/register

# 1 critical node
curl -vfs --request PUT -d '{"Node": "critical-host-redis-1-4", "Address": "192.168.1.4", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-4.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 100, "Warning":7}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"script", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "critical"}]}}' http://localhost:8500/v1/catalog/register

# 1 maintenance node
curl -vfs --request PUT -d '{"Node": "maint-host-redis-1-5", "Address": "192.168.1.5", "Service": {"Service": "redis", "Tags": ["http", "tag_host-redis-1-4.test.acme.com", "start_20184431"], "Port": 8000, "Weights":{"Passing": 100, "Warning":7}}, "Checks":[{"id": "canard", "ServiceID": "redis", "Name":"_maint", "Interval": "10m", "Args":["/tmp/check_script.sh"],"Notes":"hello", "Status": "maintenance"}]}}' http://localhost:8500/v1/catalog/register

Result

dig SRV redis.service.consul -p 8600 @localhost +short
1 100 8000 working-host-redis-1-1.node.dc1.consul.
1 1 8000 noweight-host-redis-1-0.node.dc1.consul.
1 600 8000 large-host-redis-1-0.node.dc1.consul.
1 0 8000 maint-host-redis-1-5.node.dc1.consul.
  • Old node (Passing) has a weight of 1 as expected
  • The warning node has a weight of 7 as expected
  • Other nodes have a weight a indicated in their "Passing" aka 100
  • The critical node is not shown (as it used to work)
  • Note that the service in maintenance is visible, BUT it has a weight of 0

@banks
Copy link
Member

banks commented Aug 2, 2018

This looks awesome Pierre. Just 1 question from your high level description before I give the code a proper look:

Note that the service in maintenance is visible, BUT it has a weight of 0

This is not intuitive behaviour for me. The rest makes total sense but why would we treat maintenance check as a special case here? It feels like more magic than we like. I'd expect maintenance mode to behave like it does now - treated the same as critical.

The main reason I think this is an issue other than it being new and a bit magical, is that most DNS implementations ignore the weights and this change would cause people using maint checks to suddenly break if their clients ignore the weights and so still send the traffic there.

What do you think? Did I misunderstand the current behaviour somehow?


edit:

I'm actually puzzled how that last example is working. Our built in maintenance checks use Critical:

Status: api.HealthCritical,
. We do have api.HealthMaint = "maintenance" defined but it's never actually used anywhere as far as I can see 😄.

And I don't see any explicit behaviour in your diff changing the behaviour for maintenance status. Is it relying on use somehow being permissive of statuses and treating anything we don't understand as some unknown value? I'd need to debug more to figure that out.

@pierresouchay
Copy link
Contributor Author

@banks Basically, it is just that HealthCheck Status do not enforce values as they are represented as Strings. Thus, any value different from "passing", "warning" is considered with weight = 0

Critical are not seen as already "cleaned up" when requesting Health => not visible since not returned by RPC call anyway

In case of an unknown Status, we apply the rule:

  • "passing" => apply Passing weight
  • "warning" => apply Warning weight
  • anything else => 0

I had to implement a rule for "Unknown statuses" since the API let you specify what you want, thus, it was to illustrate this behavior. BWT, should I also compare statuses with "PASSING" and "Passing" as well since in most Consul API, case sensitivity is sometime a bit confusing, especially in JSON payloads?

Possible improvements

Another Remark, and I would like your input:

Upgrading an existing system might be tricky since when you start adding weights, default values Apply (it means 1 for "passing", "0" for "warning")

The problem is that on a production system, if you have 100 nodes having a service, it is "hard" if not impossible to update all Consul definitions at once. So, if you choose to add "100" for weight on your service, when you update this node, it gets weight = 100 while all others keep weight = 1... => troubles.

So an improvement would be to be able to provide in the Consul Server configuration a default value for Weight.Passing and Weight.Warning which would be applied by default.

Thus, when you upgrade your Consul Cluster, you put this value to 100, and adding weights to service would less likely break you production. If you agree with such proposal, I might implement this in another PR

When using DNS interface with only_passing = false, all nodes
with non-Critical healthcheck used to have a weight value of 1.
While having weight.Warning = 0 as default value, this is probably
a bad idea as it breaks ascending compatibility.

Thus, we put a default value of 1 to be consistent with existing behaviour.
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some doc nitpicks.

I'll try to dig through this PR in more depth this week and get it moved forwards.

Thanks as always for the work and patience!

DNS SRV responses. If this field is not specified, its default value is:
`"weights": {"passing": 1, "warning": 1}`.
When a service is `critical`, it is excluded from DNS responses, `warning` services
will be in DNS responses as well unless the `only_passing = true` is set in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this reads a little confusingly and might be interpreted backwards, instead we could simplify it into multiple sentences:

When a service is critical, it is excluded from DNS responses. Services with warning checks are in included in responses by default, but excluded if the optional param only_passing = true is present in agent DNS configuration or ?passing is used via the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

When DNS SRV requests are issued, response will include the weight in the SRV
response accordingly to the state of service. It allows to put more weights
on services running on larger nodes and optionnally to put less pressure on
services being in `warning` state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When DNS SRV requests are made, the response will include the weights specified given the state of the service. This allows some instances to be given hight weight if they have more capacity, and optionally allows reducing load on services with checks in warning status by giving passing instances a higher weight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks DONE as well in next patch

@pierresouchay
Copy link
Contributor Author

@mkeeler @banks if you have remarks regarding my comment #4468 (comment) do not hesitate

Kind Regards

@banks
Copy link
Member

banks commented Aug 24, 2018

So an improvement would be to be able to provide in the Consul Server configuration a default value for Weight.Passing and Weight.Warning which would be applied by default.

Yep I agree with this. I think this is an issue with the PR/feature as it stands. We could land it with good docs to point that out though.

I'd like to hold off adding centralised config and all the machinery needed to support that for now though for a couple of reasons:

  • It needs some thought and quite a bit of lifting to get the API right so it's not a snowflake we can't use for anything else. We'd need server config for it and then at very least some (blocking?) read mechanism that can detect when that config is changed, which probably implies that it needs to be not just config but actually stored in state store so we can block on it. Otherwise change and HUP on the server won't actually take effect.
  • We have some tentative plans to add something pretty close to this in a much more general way that will allow a lot more service-level config management centrally. It would be perfect for what you describe.

I will talk through with the team on our call later, but I suspect the answer is either to land this as it is and document the gotcha well, or to hold this for a while (will be at least a few months) so we can build it as part of our bigger change.

@pierresouchay are you using this in your deployment already? Are you keen to see it land soon for reason other than just to get it done 😄?

@pierresouchay
Copy link
Contributor Author

@banks Thank you, here are my plans

Migration to weights

For the default value, I planned the following:

Define a WeightDefaults (Agent Setting)

The value is applied on all services that Have now weight set. It allow to set a config on a given host, for instance: 300 on a large instance, 50 on a tiny one.

Define a ClusterDefaultWeight (Server Setting / DNS Client setting)

When a Server receives a RPC request without a weight, Use this values to fill Weights => ensure that old clients CAN have default setting that are not {Passing: 1, Warning: 1}

When a Agent performing DNS query answers, if Weight is not set in Service, Use these values => ensure that existing Service will have similar values.

=> This will ensure that it is possible to work with DNS Agent upgraded, while servers are not OR services have not yet been migrated
=> This also ensure that old Agent within the cluster can work flawlessly

About our status

I am heavily pushing for weight standardisation internally, because I plan to use this for:

  • F5 load-balancers
  • HaProxy
  • Our Client Side Load Balancing libraries
  • Some of our libraries that are performing DNS Discovery using SRV records
  • Some very basic clients that are using plain DNS (name only, no port) => in order to have weights applied
  • I am also planning to test Connect very very soon, so I want to have something consistent with all LB technologies
  • I am also working with HaProxy people you might know in order to include this into Haproxy natively => but it means our current mechanics (tags / Service meta) cannot work if we want HaProxy inclusion of those mechanics without patching the world every time

This is crucial for us since we are moving from pure bare-metal systems to virtualised ones and thus, there are very heterogeneous hosts for a given kind of services.

We currently live without by using tags/metadata, but it is not sustainable to add this to each of our layers (for instance, complicate with systems ONLY using DNS), that is why I think this is a very important step.

Of course, once HashiCorp will agree on this, I plan to implement:

  • weights being applied for A/AAAA queries
  • all the default mechanisms a describe above

@banks
Copy link
Member

banks commented Aug 24, 2018

Ah thanks. I forgot that all these queries are served by a server already so no need to "push" config out to agents to make this work.

In that case I agree that having a way to set default weights in agent config that get applied by servers in response to RPCs where the stored instances have no weights defined makes sense and mitigates the issue you described with rolling out new weights.

There will still be potential gotchas with changing those defaults or the relative size of weights in general - if you start with weights where 10 is normal for example and then want a way to have nodes that have less than 10% of load and decide to change to making 100 be the standard instead you would have the same issue that all the current nodes that defined a weight relative to 10 will need to update in sync or will have their traffic messed up for some time while it all rolls out. I think that's fundamentally unavoidable with weights defined in this way though so it's not a blocker but we probably do need to take care to spell that out in documentation for service weights. (I think this maybe warrants a few paragraphs in a section somewhere rather than just notes on the service definition fields for weights).

Anything else => 0

Regarding the default behaviour for non-standard check status strings. Even though those are kinda outside our expected values and don't mean much, I'm not sure it's OK to change load balancing behaviour out of the box for those.

For example if someone has some bad script that is setting "OK" instead of "passing" as the status, right now I think (please correct!) DNS will still list their instances. I think your change here will cause those instances suddenly to be listed as weight = 0 and so some DNS clients will just not send any traffic to any OK instances if there are others with warnings!

I realise it's super edge-casey but it seems safer to me to return weight of 1 for anything we don't understand then the behaviour now is preserved - we don't penalise any checks that for some reason are in a state we don't understand. What do you think?

@pierresouchay
Copy link
Contributor Author

@banks I am Ok, will be fixed in next patch

@pierresouchay
Copy link
Contributor Author

I realise it's super edge-casey but it seems safer to me to return weight of 1 for anything we don't understand then the behaviour now is preserved - we don't penalise any checks that for some reason are in a state we don't understand. What do you think?

@banks DONE, I also handled all possible pre-defined states in my last patch (maint/critical) => even if in theory it is not possible

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good to me.

One question we came up with when talking about this out of band is what happens when you have more SRV responses than can fit in a single message? Even with TCP DNS its still possible to run into this scenario. When we can't return all the data should we attempt at making a choice of which to return? Maybe return those with the highest weight first until we run out of space in the message.

For making load balancing decisions not having all the information would be problematic regardless as any record not returned would essentially be hidden and not get any traffic.

I am open to discussing what the best solution for this is. One way or another we need to be explicit about how we operate in the case of truncation.

return fmt.Errorf("Passing must be greater than 0")
}
if weights.Warning < 0 {
return fmt.Errorf("Warning must be greater of equal to 0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "of" -> "or"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeler DONE

@pierresouchay
Copy link
Contributor Author

@mkeeler quickly from my phone: size of DNS SRV requests is the reason of my patches a few weeks ago regarding compression as well as performance (binary lookups for truncation)

For now, with compression, edns and TCP, you can easily send around 600-700 nodes in a single request which is good enough for many usages. It could go even further by removing TXT metadata when size is too big for instance. Note that I implemented truncation bit during my patches even for TCP, that would allow for clever clients.

During my initial patches, I thought about having a predictable order when sending response for SRV, this would allow for instance to implement paging, but I wanted first my initial patches to get merged quickly, but that would be an option:
For instance:
_page_start.myservice.service.consul to retrieve first page sorted by NodeName for instance,
_page_node_node604.myservice.consul to retrieve page with nodes greater than node604 to get next results (given node604 being the last result sent). Truncation bit or metadata would help knowing if more results are available.

For most shops it should not be an issue anyway (the biggest LB aware services we have are around 500 nodes, we have some services with more than 6k nodes, but none we need to perform load balancing on, or in that case we use HTTP queries).

I am quite confident nobody uses DNS SRV on very large services now since I discovered the issue (it was crashing DNS server and sending no results prior to my patch), so I think we have some time to talk with vendors such as HaProxy to discuss advanced possible strategies, but at least this patch will allow all vendors to use Weight with HTTP discovery requests and speak the same language and avoid relying on tags, (Fabio or some or our implementations) or service meta (like we do a bit as well), everybody using DNS or HTTP will use the same data, thus allowing for more compatibility between vendors

Kind Regards

@mkeeler
Copy link
Member

mkeeler commented Aug 27, 2018

@pierresouchay All that sounds reasonable. With TCP DNS and compression it should be much much more difficult to get into a scenario where you can't get all the SRV RRs for your query. Honestly the "problem" is probably more theoretical than practical.

I think its probably sufficient to just add a sentence in the docs about the fact that when the results are truncated no attempt is made to select which records get returned but rather should be considered effectively random.

@pierresouchay
Copy link
Contributor Author

@mkeeler DONE, added documentation in website/source/docs/agent/dns.html.md.

Will allow master branch to be merged properly
@pierresouchay
Copy link
Contributor Author

@banks @mkeeler Are you waiting for something else for this review? Is it Ok?
Best Regards

@banks
Copy link
Member

banks commented Sep 5, 2018

@pierresouchay Matt's on vacation until tomorrow and I'm waiting to check in with him - it looks like between you got this in a good shape while I was away. Hopefully he'll be able to catchup with things over next couple of days here.

Thanks for your patience!

@pierresouchay
Copy link
Contributor Author

@banks Ok, no problem, I'll wait :-)

Regarding weights, there is also this one: #4576 => will allow to leverage easily this feature without having to modify existing healthcheck (for instance, allow easily to have a healthcheck to slow to be similar to 'Too Many requests' and thus applying those new Warning weights. I think this is quite interesting for all apps for which people cannot modify healthchecks

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the doc updates.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of super nitpicky comments about the docs but I think this is good to go.

website/source/docs/agent/services.html.md Outdated Show resolved Hide resolved
website/source/docs/agent/dns.html.md Outdated Show resolved Hide resolved
@pierresouchay
Copy link
Contributor Author

@banks DONE

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Great work on this Pierre. Will merge a little later today.

@banks banks merged commit eddcf22 into hashicorp:master Sep 7, 2018
@pierresouchay
Copy link
Contributor Author

@mkeeler @banks Thank you so much for your reviews.

You also might have a look to this simpler PR #4576 which enable Warning state easily on services not implementing properly HTTP Too Many Requests (and will allow to use more widely Warning state)

pierresouchay added a commit to criteo/consul-templaterb that referenced this pull request Sep 7, 2018
References:
* hashicorp/consul#4468
* hashicorp/consul#4198

Change-Id: I808fc90f3dd87b43bcf6325edb9cab8bbadffe60
pierresouchay added a commit to pierresouchay/consul that referenced this pull request May 10, 2019
It implements setting Warning state for HTTP 200 or when TCP
connection succeeds but is too slow as described
in hashicorp#4522

Combined with hashicorp#4468 it will
allow very easily to reduce calls/secs on slow nodes in a
completely automatic way.
@jkirschner-hashicorp
Copy link
Contributor

@pierresouchay : you mentioned the possibility of "applying weights to A/AAAA DNS queries" a few times in this thread. Do you remember what behavior you intended to achieve with that? (I ask because A/AAAA records don't include weight, so I'm curious what it would mean to apply weights to them.)

I realize it's been a long time since this PR, so I don't expect that you'll remember! I also know (per your GH profile) that your current position no longer involves working with Consul. But I came across your comment recently while touching the Consul DNS code and wanted to ask you about it in case there's a remaining enhancement for us to consider making here.

Whether or not you're in a position to respond, I also want to thank you for your many contributions to Consul over the years :) cheers!

@pierresouchay
Copy link
Contributor Author

Hello @jkirschner-hashicorp

Weights have been implemented in SRV requests that do support weights. My idea at that time was to return statistically according to weights the results on A/AAAA queries (and according to current passing/warning status).

So, if you have weights:
Node1: passing 10, warning 1
Node 2: passing 20, warning 2,
Node 3: passing 10, warning 1

And given node2 being on warning state, you would return node2 2 times every 22 calls, node1 & node3 10 times every 22 calls.

That was the idea. In my previous company, we never used that (we either used SRV records with DNS or HTTP APIs in the end), but I found this basic solution very interesting (while it has many implications on caching side, but that's an issue that already exists with current implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants