Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul will accept an empty client_addr and not report a failure to create client services #5371

Closed
karl-tpio opened this issue Feb 22, 2019 · 4 comments
Labels
help-wanted We encourage community PRs for these issues! type/bug Feature does not function as expected

Comments

@karl-tpio
Copy link

Overview of the Issue

no warn/error emitted in logs when a client_addr directive is empty. consul continues to start up, but does not create a listening socket. Client services (dns/http) are unavailable.

Reproduction Steps

Currently, i have a 3 node cluster in a VPC. Almost bog-standard adherence to the standard install guide.

Each of the three nodes is identical, as they were all spun from the same AMI.

Here is how they are configured:

root@ip-172-25-26-150:/etc/consul.d# cat *.hcl
##
# Consul's defaults are pretty sane, a few things we must change:
#
# See: https: //www.consul.io/docs/agent/options.html
##

retry_join = [
  "provider=aws tag_key=consul.cluster tag_value=testing",
]

datacenter = "us-west-1"

encrypt = "<lol.nope>"

performance {
  raft_multiplier = 1
}

bind_addr = "{{ GetPrivateIP }}"

client_addr = "{{ GetInterfaceIP \"dummy0\" }}"

# TODO: this still needs to be tested, but i believe that systemd resolve conf requires we listen on 53
ports = {
  dns = 53
}

server = true

bootstrap_expect = 3

ui = true

I am expecting to see consul listening on port 53, on the IP address of the dummy0 interface.

E.G.:

  1. Consul is up, and talking with peers:
root@ip-172-25-26-150:/etc/consul.d# service consul status
● consul.service - "HashiCorp Consul - A service mesh solution"
   Loaded: loaded (/etc/systemd/system/consul.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-02-22 02:04:35 UTC; 1min 38s ago
     Docs: https://www.consul.io/
 Main PID: 3294 (consul)
    Tasks: 11 (limit: 2320)
   CGroup: /system.slice/consul.service
           └─3294 /usr/local/bin/consul agent -config-dir=/etc/consul.d/

Feb 22 02:04:35 ip-172-25-26-150 consul[3294]: 2019/02/22 02:04:35 [DEBUG] raft-net: 172.25.26.150:8300 accepted connection fro
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]: 2019/02/22 02:04:35 [DEBUG] raft-net: 172.25.26.150:8300 accepted connection fro
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] discover-aws: Instance i-0c78ade88c720b1bc has pr
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] discover-aws: Instance i-0aaddba15a37bfebe has pr
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] discover-aws: Instance i-05f99f260e6f8bb54 has pr
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] agent: Discovered LAN servers: 172.25.26.150 172.
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] agent: (LAN) joining: [172.25.26.150 172.25.21.12
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] agent: (LAN) joined: 3 Err: <nil>
Feb 22 02:04:35 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:35 [INFO] agent: Join LAN completed. Synced with 3 initial
Feb 22 02:04:36 ip-172-25-26-150 consul[3294]:     2019/02/22 02:04:36 [INFO] agent: Synced node info
lines 1-19/19 (END)
  1. but not listening on port 53, as expected:
root@ip-172-25-26-150:/etc/consul.d# netstat -tulnp | grep 53
udp        0      0 172.25.26.150:68        0.0.0.0:*                           538/systemd-network
udp6       0      0 fe80::44:7fff:fe67::546 :::*                                538/systemd-network
  1. only when configured using the template:
root@ip-172-25-26-150:/etc/consul.d# cat network.hcl | grep client_a
client_addr = "{{ GetInterfaceIP \"dummy0\" }}"
  1. After changing the template to the physical IP address:
root@ip-172-25-26-150:/etc/consul.d# service consul restart
root@ip-172-25-26-150:/etc/consul.d# cat network.hcl | grep client_a
#client_addr = "{{ GetInterfaceIP \"dummy0\" }}"
client_addr = "169.254.1.1"
  1. things are as i expect:
root@ip-172-25-26-150:/etc/consul.d# netstat -tulnp | grep 53
tcp        0      0 169.254.1.1:53          0.0.0.0:*               LISTEN      3372/consul
udp        0      0 169.254.1.1:53          0.0.0.0:*                           3372/consul
udp        0      0 172.25.26.150:68        0.0.0.0:*                           538/systemd-network
udp6       0      0 fe80::44:7fff:fe67::546 :::*                                538/systemd-network

I have given the consul.service the rights to bind to sockets <1024, as you can see, there is no issue when the IP address is explicitly given.

I believe the core of the issue is here:
hashicorp/go-sockaddr#31

Basically, i get no warning about client_addr being empty, the same way i would get an error about bind_addr being empty.

I do not know if this is expected or intended behavior. It does not seem intuitive to me that consul would start up w/o any warning indicating that it can't provide services for clients as intended. It also does not seem intuitive that if client_addr is set, but empty, a fall back to bind_addr does not happen.

Operating system and Environment details

I am running on bog-standard ubuntu on intel on aws:

root@ip-172-25-26-150:/etc/consul.d# cat /proc/cpuinfo | grep model
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
root@ip-172-25-26-150:/etc/consul.d# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"

Log Fragments

If you need more info, i will be happy to provide. for now, there is nothing incredibly interesting in the logs (the three nodes just come up and find one another...)

@pearkes
Copy link
Contributor

pearkes commented Mar 18, 2019

Thanks so much for filing this issue in detail here, from what I can see you likely did find the root cause in hashicorp/go-sockaddr#31, really appreciate that.

@pearkes pearkes added type/bug Feature does not function as expected help-wanted We encourage community PRs for these issues! labels Mar 18, 2019
@kquinsland
Copy link

@pearkes no problem. My go skills are not up to snuff, so i can't remediate the root of the problem myself. I can rely on my crack triage skills and do as much as i can to make it as easy as possible to solve for y'all, though.

if there's anything else i can do, let me know.

@deblasis
Copy link
Contributor

deblasis commented Nov 1, 2021

Hi there,

huge fan and shamelessly wannabe hashicorp...(ian? 🤔)
I wanted to wipe some dust from this issue if you don't mind, especially considering the great triage work done by @kquinsland which I would like to see capitalized somehow.

I have created the PR #11461 that simply adds the warning and I think I am gonna have a look at go-sockaddr as well.

@blake
Copy link
Member

blake commented Jan 21, 2022

This behavior has been fixed as of Consul 1.11.0.

@blake blake closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help-wanted We encourage community PRs for these issues! type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

5 participants