-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External servers in 1.0.0 no longer honours discovery #1742
Comments
Hi @mr-miles we're working on updating our docs based on the large surface area of changes we made for 1.0. For 1.0, we no longer use go-discover and use another library called go-netaddrs for discovery. You can read about the changes here, which are still being staged for publishing on our docs page: https://github.com/hashicorp/consul/blob/328d081bc9c339cc48f95c8d4d0cce1fd3b4b81a/website/content/docs/k8s/deployment-configurations/servers-outside-kubernetes.mdx#join-external-servers-to-consul-on-kubernetes |
got it thanks .... i am too keen! have been awaiting this release so thanks again |
Sorry - I tried it and it nearly works but it looks like the discover binary isn't included in the container image (or I need to enable it somehow) My connect-injector pod has this environment variable: But I see this in the pod logs and they fail to reach readiness |
Thanks! We'll take a look! |
Hi @mr-miles! Thanks for finding this and helping us out! I put up a add go-discover PR and it should end up in the Please let me know if you test this beforehand as you can use the control-plane image
|
Thanks for the quick turnaround. I've tested it out and discovery is now finding the server addresses as expected! So victory. HOWEVER, something is still amiss. My connect-injector pod and terminating-gateway-init pod are both failing like this: 2022-11-21T19:09:40.769Z [INFO] consul-server-connection-manager: trying to connect to a Consul server Consul servers are 1.4.0, are the addresses it discovered, and they have ports.grpc_tls set to 8502. I've also verified connectivity to 8502 with telnet. Connect injector has these environment variables: CONSUL_ADDRESSES : exec=discover -q addrs provider=aws tag_key=ConsulCluster tag_value=xxx Do you know if there's any other fixes pending that might be causing this? And/or if you know of a simple setting that would help me get beyond the TRANSIENT_FAILURE error message to something more detailed? |
This seems to be an issue |
Thanks - is it expecting the grpc port to be using tls? I’ve seen a few changes in that area in consul 1.4.0: do both grpc and grpc_tls need to be open? |
@mr-miles Just to double check could you perhaps send over your server config? Also are your external servers also on 1.14.0? |
Hi Thanks! The server config is below. The external servers are all on 1.14.0 (I initially did upgrade the clients to 1.14 and had the servers on 1.13.3 and nothing worked; the upgrade sorted that out). "advertise_addr": "10.142.xxx", |
@mr-miles ah yes, you should only need one or the other (no requirement to have both). There are some features (like cluster peering using mesh gateways) that requires using the grpc TLS port. But if you are able to test it with using only the grpc port (non-tls) by just setting In the meanwhile also looking into a way to enable trace logs for server-connection-manager as that would help confirm if its a TLS issue. |
@mr-miles can you try setting global.logLevel=trace in your cluster and post the logs from the connect-injector pod again? I think that would help us see if there's any grpc TLS issues with the connection. |
Well that was easier than expected ... so now I see things like this in the logs: 2022-11-21T23:21:31.687Z [TRACE] consul-server-connection-manager: clientConnWrapper.NewSubConn: addrs=["{\n "Addr": "10.142.xxx:8502",\n "ServerName": "",\n "Attributes": null,\n "BalancerAttributes": null,\n "Type": 0,\n "Metadata": null\n}"] |
@mr-miles thank you! the way we set the ServerName is through the environment variable I see that we should probably also update the doc for the tlsServerName value since it only references https connections but it is used for more than that |
Ok - no i dont have it set. somehow it worked previously without it - should it be set to server..? Also, if its not set and its just requesting via IP, can it not validate it using the IP SANs? |
It should be server.DATACENTER.DOMAIN I'd have to look into why it doesn't work with IP SANs, I'm not sure the details of how the grpc tls cert is generated. But I see that when setting up clusters with external servers, we definitely are using that value. It may not have been required before because the 1.0.0 release had a large refactor with removing the need to use client agents so its possible there was a change that requires setting the tlsServerName |
Yay! Setting it to that has made everything green again - thanks for your help! I did have a quick look around and tls server name is set to server.DATACENTER.DOMAIN by default but only if cloud.enabled is true, so maybe that should be a bit more permissive if that setting is needed to make it work. Anyhow, thanks again! |
Sounds good, I'll get some additional opinions on whether it makes sense to have a default tls server name with external servers. It's possible that wouldn't be desirable to set a default because its hard to know what configuration external servers might have. But for sure we will at least update the values.yml documentation on that value! |
I had a look through the code as I agree with you that having a default name doesn't sit quite right. Looking through the code, there is no way to configure things so it will accept any valid certificate - you either specify an explicit hostname, or set it to not verify anything. An alternative fix could be, in consul-server-connection-manager, if no server name is specified but insecure tls is required, then use the hostname being used to initiate the connection. In the case where that hostname is an IP address, then the current logic would correctly look at the IP sans. |
@mr-miles could you file the feature request in https://github.com/hashicorp/consul-server-connection-manager? Thanks I do think its better to track it there as a feature request and is a good area for UX improvement where the hostname is an IP address. |
Thanks @mr-miles will go ahead and close here and track on that repo moving forward. |
Community Note
Overview of the Issue
I upgraded from 0.48 chart to 1.0.0. I am running consul servers outside k8s and have externalServers.hosts set to discover the addresses via aws lookups. The new agentless setup treated the provider string as a dns name to look up, and errored that the name did not exist.
Is this still a supported setup? Should I expect the provider/discovery templates to work or is there a different way to set this up now?
Thanks for the 1.0.0 release - appreciate all the hard work that has gone into it!
Reproduction Steps
Add to helm chart
externalServers.hosts = ["provider=aws tag_key=XX_KEY tag_value=XX_VALUE"]
where XX_KEY and XX_VALUE are the ec2 instance tags that have been used
Expected behavior
Expect the consul-dataplane component to respect the same discovery rules as the consul client.
Environment details
AWS EKS
The text was updated successfully, but these errors were encountered: