Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch doesn't work, when Consul use only SSL #4076

Closed
richie-tt opened this issue May 2, 2018 · 8 comments
Closed

Watch doesn't work, when Consul use only SSL #4076

richie-tt opened this issue May 2, 2018 · 8 comments

Comments

@richie-tt
Copy link

Description of the Issue (and unexpected/desired result)

After switching Consul to HTTPS protocol, the watch service is not able to connect to the Consul anymore. Looks like that the watch service still expect only HTTP service.

When I remove config for SSL than all works fine.
In configuration of Consul I disabled the http and use only https.

May  2 10:47:21 sql8b-stg-1 consulID: 2018/05/02 10:47:21 [ERR] consul.watch: Watch (type: checks) errored: Get http://0.0.0.0:8500/v1/health/checks/shared1: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02", retry in 20s
May  2 10:47:22 sql8b-stg-1 consulID: 2018/05/02 10:47:22 [WARN] agent: Check "perconaservice" is now warning
May  2 10:47:22 sql8b-stg-1 consulID: 2018/05/02 10:47:22 [INFO] agent: Synced check "perconaservice"

Reproduction steps

Environment Variables

export CONSUL_HTTP_ADDR="https://127.0.0.1:8500"
export CONSUL_HTTP_SSL="true"
export CONSUL_HTTP_SSL_VERIFY="false"

Watche config

  "watches":[
    {
        "type": "checks",
        "service": "shared1",
        "handler_type": "http",
        "http_handler_config": {
            "path":"https://salt01.tech.inet:4080/hook/percona",
            "method": "POST",
            "header": {"cluster": ["shared1"]},
            "timeout": "10s",
            "tls_skip_verify": true
        }
    }
  ]

Client config

{
  "advertise_addr": "10.202.24.8",
  "bind_addr": "10.202.24.8",
  "client_addr": "0.0.0.0",
  "log_level": "INFO",
  "datacenter": "eu-west-1",
  "node_name": "10.202.24.8",
  "retry_join": ["consul.tech.inet"],
  "server": false,
  "ui": false,
  "ports": {
    "http": -1,
    "https": 8500
  },
  "key_file": "/etc/pki/tls/private/consul.key",
  "cert_file": "/etc/pki/tls/certs/consul.crt",
  "domain": "consul",
  "enable_script_checks": true
}

consul version for both Client and Server

Client: Consul v1.0.7 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Server: Consul v1.0.7 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

consul info for both Client and Server

Client:

consul info
agent:
        check_monitors = 2
        check_ttls = 0
        checks = 4
        services = 1
build:
        prerelease = 
        revision = fb848fc4
        version = 1.0.7
consul:
        known_servers = 3
        server = false
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 44
        max_procs = 2
        os = linux
        version = go1.10
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 150
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1583
        members = 6
        query_queue = 0
        query_time = 1

Server:

agent:
        check_monitors = 0
        check_ttls = 1
        checks = 1
        services = 1
build:
        prerelease = 
        revision = fb848fc4
        version = 1.0.7
consul:
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = 10.200.22.20:8300
        server = true
raft:
        applied_index = 1529547
        commit_index = 1529547
        fsm_pending = 0
        last_contact = 48.192473ms
        last_log_index = 1529547
        last_log_term = 1701
        last_snapshot_index = 1521691
        last_snapshot_term = 1701
        latest_configuration = [{Suffrage:Voter ID:baf253c9-7a07-e8bf-2a16-4fdac5b8d779 Address:10.200.12.34:8300} {Suffrage:Voter ID:6458a929-bdf5-4bc1-e4b0-a966618c6749 Address:10.200.22.20:8300} {Suffrage:Voter ID:4725f550-8fe0-b75c-9a11-19417a14800f Address:10.200.32.22:8300}]
        latest_configuration_index = 1395152
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 1701
runtime:
        arch = amd64
        cpu_count = 1
        goroutines = 81
        max_procs = 1
        os = linux
        version = go1.10
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 150
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 4
        member_time = 1583
        members = 10
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 202
        members = 3
        query_queue = 0
        query_time = 1

Operating system and Environment details

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.11.6.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

Log Fragments or Link to gist

May  2 11:08:58 sql8b-stg-1 consulID: 2018/05/02 11:08:58 [ERR] consul.watch: Watch (type: checks) errored: Get http://0.0.0.0:8500/v1/health/checks/shared1: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02", retry in 3m0s
May  2 11:11:58 sql8b-stg-1 consulID: 2018/05/02 11:11:58 [ERR] consul.watch: Watch (type: checks) errored: Get http://0.0.0.0:8500/v1/health/checks/shared1: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02", retry in 3m0s
May  2 11:14:58 sql8b-stg-1 consulID: 2018/05/02 11:14:58 [ERR] consul.watch: Watch (type: checks) errored: Get http://0.0.0.0:8500/v1/health/checks/shared1: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02", retry in 3m0s

TIP: Use -log-level=TRACE on the client and server to capture the maximum log detail.

@richie-tt richie-tt changed the title Watch doesn't work, when cosnul use only SSL Watch doesn't work, when Consul use only SSL May 2, 2018
@mkeeler
Copy link
Member

mkeeler commented May 31, 2018

@1Ricardo The environment variables you have set are they set for the sql8b-stg-1 agent, or the servers?

I have identified a potential place in the code where we should force the watch to use https instead of http. However if the env vars are set when the watch is started it should take those into account when determine to use encryption or not.

mkeeler added a commit that referenced this issue Jun 1, 2018
Fix #4076 - Agent configured Watches now work with HTTPS only agents
@richie-tt
Copy link
Author

Hi @mkeeler

Thank You for your involve in that case and Yes, the variables were declared on both nodes (server and client)

@mkeeler
Copy link
Member

mkeeler commented Jun 1, 2018

@1Ricardo I managed to reproduce at least one instance of this and I think it should fix your particular case as well. If for some reason it doesn't fix it for your particular use case please reopen this issue.

@educriado
Copy link

I also had this issue and it's corrected now with the new version (1.2.0). However, now I get this error when I trigger the watch:

Jun 26 14:11:18 linux-server consul[9108]:     2018/06/26 14:11:18 [ERR] agent: Failed to run watch: Failed to connect to agent: address https://0.0.0.0:8500: too many colons in address

It looks like there's an extra colon at the end of the address. The configuration of the agent is the following:

# Cluster
bootstrap_expect = 3
datacenter = "sth"
domain = "__REDACTED__"
retry_join = ["__REDACTED__", "__REDACTED__", "__REDACTED__"]

# Agent
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
data_dir = "/var/consul"
enable_syslog = true
addresses {
  https = "0.0.0.0"
}
ports {
  dns = 8600
  http = -1
  https = 8500
  serf_lan = 8301
  serf_wan = 8302
  server = 8300
}
server = true
ui = true

# Gossip encryption
# The key should be specified in a temporary file that is removed after first load
encrypt_verify_incoming = true
encrypt_verify_outgoing = true

# TLS
ca_file = "/usr/local/share/ca-certificates/cert.crt"
cert_file = "/etc/consul.d/agent.crt"
key_file = "/etc/consul.d/agent.key"
verify_incoming_rpc = true
verify_incoming_https = true
verify_outgoing = true
verify_server_hostname = true

# ACL
acl_datacenter = "sth"
acl_default_policy = "deny"
acl_down_policy = "extend-cache"
acl_master_token = "__REDACTED__"
acl_agent_token = "__REDACTED__"

# Agent specific
node_name = "linux-server"
advertise_addr = "__REDACTED__"
enable_script_checks = true

# Watches
watches {
    type = "checks"
    state = "warning"
    handler_type = "script"
    args = ["python3", "/etc/consul.d/print-stdin.py"]
}

OS info:

Distributor ID: Ubuntu
Description:    Ubuntu 18.04 LTS
Release:        18.04
Codename:       bionic

Consul version:

Consul v1.2.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

@richie-tt
Copy link
Author

@mkeeler I have the same issue, now the watch cannot be triggered

Jul  9 11:44:42 sql198a-prd-1 consulID: 2018/07/09 11:44:42 [ERR] agent: Failed to run watch: Failed to connect to agent: address https://0.0.0.0:8500: too many colons in address

@richie-tt
Copy link
Author

I am not able to re-open this issue, because it was not closed by me,

you cannot re-open your own issues if a repo collaborator closed them

@educriado
Copy link

Thanks for opening a new issue @1Ricardo

@patcable
Copy link

@mkeeler funny enough: I'm burned by the opposite now: I use http with local client comm since it's the same host, but https with client => server comm. Now that https is forced if enabled, my watches arent working. Should this have never worked in the first place?

Happy to open another ticket, too, but it is related to this code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants