Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch on KV fails when using https (cannot validate certificate for 127.0.0.1) #4718

Closed
igal-s opened this issue Sep 27, 2018 · 8 comments
Closed
Labels
theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication type/enhancement Proposed improvement or new feature

Comments

@igal-s
Copy link
Contributor

igal-s commented Sep 27, 2018

Overview of the Issue

In version 1.2.3, Consul watch fails when https is enabled, it tries to access https://127.0.0.1:8500 ignoring both CONSUL_TLS_SERVER_NAME and CONSUL_HTTP_ADDR environment variables.
When CONSUL_HTTP_SSL_VERIFY=false the watch works as expected.

In version 1.1.0 watch works as expected when CONSUL_HTTP_SSL_VERIFY=true CONSUL_TLS_SERVER_NAME and CONSUL_HTTP_ADDR are set correctly.

Reproduction Steps

Steps to reproduce this issue, eg:

On a client node run the shell commands below once with VER=1.1.0 and once with VER=1.2.3:

VER=1.X.X
service consul stop
wget https://releases.hashicorp.com/consul/${VER}/consul_${VER}_linux_amd64.zip
unzip consul_${VER}_linux_amd64.zip
mkdir /opt/consul/${VER}
mv consul /opt/consul/${VER}/

sudo \
CONSUL_HTTP_SSL=true \
CONSUL_HTTP_SSL_VERIFY=true \
CONSUL_CACERT=/etc/ssl/consul-ca.crt \
CONSUL_CLIENT_CERT=/etc/ssl/certs/consul.crt \
CONSUL_CLIENT_KEY=/etc/consul/ssl/private/consul.key \
CONSUL_HTTP_ADDR=localhost.infra.__REDACTED__:8500 \
CONSUL_TLS_SERVER_NAME=localhost.infra.__REDACTED__:8500 \
-u consul -g consul \
/opt/consul/${VER}/consul agent \
-config-dir=/opt/consul/tmp_consul_config \
-advertise __REDACTED__

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = 48d287ef
        version = 1.2.3
consul:
        known_servers = 3
        server = false
runtime:
        arch = amd64
        cpu_count = 16
        goroutines = 49
        max_procs = 16
        os = linux
        version = go1.10.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 535
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 5025
        members = 79
        query_queue = 0
        query_time = 1
Clinet config file
{
  "watches": [
    {
      "args": [
        "logger",
        "-i",
        "-t",
        "consul_watch",
        "consul watch has been triggered"
      ],
      "handler_type": "script",
      "key": "test/mykey",
      "type": "key"
    }
  ],
  "retry_join": [
    "provider=\"azure\" tag_name=\"__REDACTED__\" tag_value=\"__REDACTED__\" tenant_id=\"__REDACTED__\" client_id=\"__REDACTED__\" subscription_id=\"__REDACTED__\" secret_access_key=\"__REDACTED__\""
  ],
  "encrypt_verify_incoming": false,
  "encrypt_verify_outgoing": false,
  "encrypt": "__REDACTED__",
  "verify_server_hostname": true,
  "verify_incoming_rpc": true,
  "verify_incoming_https": false,
  "verify_incoming": false,
  "verify_outgoing": true,
  "ca_file": "/etc/ssl/consul-ca.crt",
  "datacenter": "__REDACTED__",
  "enable_script_checks": true,
  "recursors": [
    "__REDACTED__"
  ],
  "domain": "infra.__REDACTED__",
  "data_dir": "/var/lib/consul",
  "ports": {
    "server": 8300,
    "serf_wan": 8302,
    "serf_lan": 8301,
    "https": 8500,
    "http": -1,
    "dns": 8600
  },
  "enable_syslog": true,
  "log_level": "INFO",
  "acl_datacenter": "__REDACTED__",
  "acl_default_policy": "deny",
  "acl_down_policy": "extend-cache",
  "acl_agent_token": "__REDACTED__",
  "acl_token": "__REDACTED__",
  "disable_remote_exec": false,
  "key_file": "/etc/consul/ssl/private/consul.key",
  "cert_file": "/etc/ssl/certs/consul.crt"
}
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = 48d287ef
        version = 1.2.3
consul:
        bootstrap = false
        known_datacenters = 7
        leader = false
        leader_addr = __REDACTED__:8300
        server = true
raft:
        applied_index = 12632340
        commit_index = 12632340
        fsm_pending = 0
        last_contact = 4.294596ms
        last_log_index = 12632340
        last_log_term = 64
        last_snapshot_index = 12628136
        last_snapshot_term = 64
        latest_configuration = [{Suffrage:Voter ID:__REDACTED__ Address:__REDACTED__:8300} {Suffrage:Voter ID:__REDACTED__ Address:__REDACTED__:8300} {Suffrage:Voter ID:__REDACTED__ Address:__REDACTED__:8300}]
        latest_configuration_index = 12615252
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 64
runtime:
        arch = amd64
        cpu_count = 16
        goroutines = 390
        max_procs = 16
        os = linux
        version = go1.10.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 535
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 5025
        members = 79
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 668
        members = 21
        query_queue = 0
        query_time = 1

Operating system and Environment details

Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

Log Fragments

syslog when "VER=1.2.3"

Sep 27 10:08:40 REDACTED consul[3635]: agent: Join LAN completed. Synced with 3 initial agents
Sep 27 10:08:40 REDACTED consul[3635]: agent: Synced node info
Sep 27 10:08:43 REDACTED consul[3635]: consul.watch: Watch (type: key) errored: Get https://127.0.0.1:8500/v1/kv/test/mykey: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs, retry in 20s

syslog when "VER=1.1.0":

Sep 27 10:07:22 REDACTED consul[2255]: agent: Join LAN completed. Synced with 3 initial agents
Sep 27 10:07:23 REDACTED consul[2255]: agent: Synced node info
Sep 27 10:07:26 REDACTED consul_watch[2404]: consul watch has been triggered

@vaLski
Copy link
Contributor

vaLski commented Sep 27, 2018

Hey @igal-s this is not problem with the consul itself but rather problem with the TLS certificate your agent is using. Most probably the TLS certificate is missing correct subject alt name field or this filed is incommplete and does not contain 127.0.0.1.

After creating the CSR, prior issuing the certificate make sure that the CSR correctly include 127.0.0.1 in the subjectAltName. See example:

openssl req -text -noout -verify -in example.csr | grep -i Subject:
  Subject: C=US, ST=NA, L=US, O=Company, OU=Internet, CN=example.com/emailAddress=mail@example.com/subjectAltName=IP:localhost,IP:127.0.0.1,IP:1.2.3.4

If all hosts and ips of the server are correctly listed inside the CSR, proceed to generate the certificate,

As soon as you issue the certificate inspect it and make sure that the IP is also correctly added to the Subject Alternative name.

openssl x509 -in example.crt -text -noout |  grep -A1 ' X509v3 Subject Alternative Name'
  X509v3 Subject Alternative Name: 
  IP Address:127.0.0.1, IP Address:1.2.3.4

I assume that the validation was fixed in 1.2.3 by #4540. Prior you upgrade your nodes to 1.2.3 or above you will need to re-issue all your certificates.

Perhaps someone should update https://www.consul.io/docs/upgrade-specific.html as well.

@igal-s
Copy link
Contributor Author

igal-s commented Sep 27, 2018

@vaLski thanks for the reply, but the fact of the matter is that in v1.1.0 when you set the CONSUL_TLS_SERVER_NAME, watch uses its value for the SAN validation and when you don't set it, it defaults to using 127.0.0.1 for that validation (and fails to validate my certificate, which is fine).
The solution you suggested would work, but unfortunately, issuing certificates for IP addresses is not an option for my environment.
I did some further testing with other versions, and the behavior is the same for 1.2.3-1.2.1 which means that #4540 has nothing to do with this issue. In v1.2.0 watch was broken (as indicated here #4359 #4358) so from my understanding v1.1.0 is the last version in which watch did not ignore CONSUL_TLS_SERVER_NAME.
My guess is that it has to do with this 8e0e239.
@mkeeler, is there currently a way to configure the agent to use a specific server name in order for the TLS validation to pass?

vaLski added a commit to vaLski/consul that referenced this issue Sep 28, 2018
While setup consul watches make sure that we are establishing connection to the correct CONSUL_TLS_SERVER_NAME if agent is configured to use SSL (CONSUL_HTTP_SSL=true)

This is initial attempt to fix hashicorp#4718

Extended debugging capabilities of the agent.
@vaLski
Copy link
Contributor

vaLski commented Sep 28, 2018

Hey @igal-s and thanks for the update. It appears that you are right and that my understanding of the problem is far more limited than I believed.

I had the chance to dig into the code for a while today and I believe I found what might be the reason for that.

Do you have a chance to test the following diff and let me now if it is solving your case?

https://github.com/hashicorp/consul/compare/master...vaLski:vaLski-4718?expand=1

igal-s added a commit to igal-s/consul that referenced this issue Sep 29, 2018
@igal-s
Copy link
Contributor Author

igal-s commented Sep 29, 2018

Unfortunately, that does not solve the issue, here is the logline that gets printed:

    2018/09/29 13:50:36 [DEBUG] agent: reloadWatches will be using https://127.0.0.1:8500

The second line which you've added (at agent/agent.go:2117) in the setupTLSClientConfig function does not get printed.

@vaLski I've also went digging in the code and added some ugly debug outputs on top of your changes igal-s@d20a892, I ran my version and these are the actual outputs:

[DEBUG-tmp] runtime: APIConfig: unixAddr = , httpAddr = , httpsAddr = 127.0.0.1:8500
[DEBUG-tmp] runtime: APIConfig: cfg.TLSConfig:  {Address:127.0.0.1:8500 CAFile:/etc/ssl/consul-ca.crt CAPath: CertFile:/etc/ssl/certs/consul.crt KeyFile:/etc/consul/ssl/private/consul.key InsecureSkipVerify:false}
[DEBUG-tmp] runtime: APIConfig: unixAddr = , httpAddr = , httpsAddr = 127.0.0.1:8500
[DEBUG-tmp] runtime: APIConfig: cfg.TLSConfig:  {Address:127.0.0.1:8500 CAFile:/etc/ssl/consul-ca.crt CAPath: CertFile:/etc/ssl/certs/consul.crt KeyFile:/etc/consul/ssl/private/consul.key InsecureSkipVerify:false}

...

---> [DEBUG-tmp] api: defaultConfig: config:  &{Address:localhost.infra.__REDACTED__:8500 Scheme:https Datacenter: Transport:0xc0000bc5a0 HttpClient:<nil> HttpAuth:<nil> WaitTime:0s Token: TLSConfig:{Address:localhost.infra.__REDACTED__:8500 CAFile:/etc/ssl/consul-ca.crt CAPath: CertFile:/etc/ssl/certs/consul.crt KeyFile:/etc/consul/ssl/private/consul.key InsecureSkipVerify:false}}

...

    2018/09/29 15:44:26 [DEBUG-tmp] agent: a.config: __REDACTED__

    2018/09/29 15:44:26 [DEBUG-tmp] agent: config: &{Address:127.0.0.1:8500 Scheme:https Datacenter:__REDACTED__ Transport:<nil> HttpClient:<nil> HttpAuth:<nil> WaitTime:0s Token: TLSConfig:{Address:127.0.0.1:8500 CAFile:/etc/ssl/consul-ca.crt CAPath: CertFile:/etc/ssl/certs/consul.crt KeyFile:/etc/consul/ssl/private/consul.key InsecureSkipVerify:false}}

...

    2018/09/29 15:44:26 [DEBUG-tmp] agent: config.TLSConfig.Address = 127.0.0.1:8500
    2018/09/29 15:44:26 [DEBUG-tmp] agent: reloadWatches will be using https://127.0.0.1:8500

What we need to resolve this issue is to get the TLSConfig.Address which is defined in the defaultConfig function in the api/api.go file.

@vaLski
Copy link
Contributor

vaLski commented Sep 30, 2018

I was wondering if your latest commit did the trick?

@igal-s
Copy link
Contributor Author

igal-s commented Sep 30, 2018

It did :-)

@vaLski
Copy link
Contributor

vaLski commented Oct 1, 2018

@igal-s Great! Would you consider sending all those changes as a pull request?

One comment on the DEBUG-tmp. Why not leaving those just as the native DEBUG lines and add them to your PR. In this way people that need to instrument that part of the code in future can benefit from those extra debugging messages.

@igal-s
Copy link
Contributor Author

igal-s commented Oct 2, 2018

I've opened the following pull request #4727 which resolves this issue.
It would be great if someone would be able to review it.

igal-s added a commit to igal-s/consul that referenced this issue Oct 6, 2018
@pearkes pearkes added type/enhancement Proposed improvement or new feature theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication labels Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

3 participants