-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch on KV fails when using https (cannot validate certificate for 127.0.0.1) #4718
Comments
Hey @igal-s this is not problem with the consul itself but rather problem with the TLS certificate your agent is using. Most probably the TLS certificate is missing correct subject alt name field or this filed is incommplete and does not contain 127.0.0.1. After creating the CSR, prior issuing the certificate make sure that the CSR correctly include 127.0.0.1 in the subjectAltName. See example:
If all hosts and ips of the server are correctly listed inside the CSR, proceed to generate the certificate, As soon as you issue the certificate inspect it and make sure that the IP is also correctly added to the Subject Alternative name.
I assume that the validation was fixed in 1.2.3 by #4540. Prior you upgrade your nodes to 1.2.3 or above you will need to re-issue all your certificates. Perhaps someone should update https://www.consul.io/docs/upgrade-specific.html as well. |
@vaLski thanks for the reply, but the fact of the matter is that in v1.1.0 when you set the CONSUL_TLS_SERVER_NAME, watch uses its value for the SAN validation and when you don't set it, it defaults to using 127.0.0.1 for that validation (and fails to validate my certificate, which is fine). |
While setup consul watches make sure that we are establishing connection to the correct CONSUL_TLS_SERVER_NAME if agent is configured to use SSL (CONSUL_HTTP_SSL=true) This is initial attempt to fix hashicorp#4718 Extended debugging capabilities of the agent.
Hey @igal-s and thanks for the update. It appears that you are right and that my understanding of the problem is far more limited than I believed. I had the chance to dig into the code for a while today and I believe I found what might be the reason for that. Do you have a chance to test the following diff and let me now if it is solving your case? https://github.com/hashicorp/consul/compare/master...vaLski:vaLski-4718?expand=1 |
Unfortunately, that does not solve the issue, here is the logline that gets printed:
The second line which you've added (at agent/agent.go:2117) in the setupTLSClientConfig function does not get printed. @vaLski I've also went digging in the code and added some ugly debug outputs on top of your changes igal-s@d20a892, I ran my version and these are the actual outputs:
What we need to resolve this issue is to get the TLSConfig.Address which is defined in the defaultConfig function in the api/api.go file. |
I was wondering if your latest commit did the trick? |
It did :-) |
@igal-s Great! Would you consider sending all those changes as a pull request? One comment on the |
I've opened the following pull request #4727 which resolves this issue. |
Overview of the Issue
In version 1.2.3, Consul watch fails when https is enabled, it tries to access https://127.0.0.1:8500 ignoring both CONSUL_TLS_SERVER_NAME and CONSUL_HTTP_ADDR environment variables.
When CONSUL_HTTP_SSL_VERIFY=false the watch works as expected.
In version 1.1.0 watch works as expected when CONSUL_HTTP_SSL_VERIFY=true CONSUL_TLS_SERVER_NAME and CONSUL_HTTP_ADDR are set correctly.
Reproduction Steps
Steps to reproduce this issue, eg:
On a client node run the shell commands below once with
VER=1.1.0
and once withVER=1.2.3
:Consul info for both Client and Server
Client info
Clinet config file
Server info
Operating system and Environment details
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
Log Fragments
syslog when "VER=1.2.3"
Sep 27 10:08:40 REDACTED consul[3635]: agent: Join LAN completed. Synced with 3 initial agents
Sep 27 10:08:40 REDACTED consul[3635]: agent: Synced node info
Sep 27 10:08:43 REDACTED consul[3635]: consul.watch: Watch (type: key) errored: Get https://127.0.0.1:8500/v1/kv/test/mykey: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs, retry in 20s
syslog when "VER=1.1.0":
Sep 27 10:07:22 REDACTED consul[2255]: agent: Join LAN completed. Synced with 3 initial agents
Sep 27 10:07:23 REDACTED consul[2255]: agent: Synced node info
Sep 27 10:07:26 REDACTED consul_watch[2404]: consul watch has been triggered
The text was updated successfully, but these errors were encountered: