Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul build on OS X doesn't use System Configuration framework for DNS resolution #3267

Closed
mbravorus opened this issue Jul 12, 2017 · 5 comments · Fixed by #21326
Closed
Labels
type/enhancement Proposed improvement or new feature

Comments

@mbravorus
Copy link

consul version for both Client and Server

Client: 0.8.5
Server: 0.8.5

consul info for both Client and Server

Client:

$ consul info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = 2c77151+
	version = 0.8.5
consul:
	known_servers = 0
	server = false
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 33
	max_procs = 4
	os = darwin
	version = go1.8.3
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Server info not relevant

Operating system and Environment details

In my particular case it is MacOS Sierra 10.12.5

Description of the Issue (and unexpected/desired result)

The issue stems from the desire to use native Consul DNS interface for service discovery. Consul cluster runs in the cloud (AWS), using private address space. A client developer machine connects to the private space via VPN. VPN server pushes out custom DNS server entries and custom domain name (consul) to search. VPN client correctly incorporates these settings into native System Configuration resolver framework. Proof:

$ scutil --dns
DNS configuration
 
resolver #1
  search domain[0] : consul
  search domain[1] : REDACTED.com
  nameserver[0] : 8.8.8.8
  nameserver[1] : 8.8.4.4
  flags    : Request A records
  reach    : Reachable
 
resolver #2
  domain   : consul
  nameserver[0] : 10.85.5.248
  nameserver[1] : 10.85.6.138
  nameserver[2] : 10.85.7.189
  flags    : Supplemental, Request A records
  reach    : Reachable
  order    : 101000
 
resolver #3
  domain   : local
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 300000
 
resolver #4
  domain   : 254.169.in-addr.arpa
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 300200
 
resolver #5
  domain   : 8.e.f.ip6.arpa
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 300400
 
resolver #6
  domain   : 9.e.f.ip6.arpa
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 300600
 
resolver #7
  domain   : a.e.f.ip6.arpa
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 300800
 
resolver #8
  domain   : b.e.f.ip6.arpa
  options  : mdns
  timeout  : 5
  flags    : Request A records
  reach    : Not Reachable
  order    : 301000
 
DNS configuration (for scoped queries)
 
resolver #1
  search domain[0] : REDACTED.com
  nameserver[0] : 8.8.8.8
  nameserver[1] : 8.8.4.4
  if_index : 4 (en4)
  flags    : Scoped, Request A records
  reach    : Reachable
 
resolver #2
  search domain[0] : REDACTED.com
  nameserver[0] : 8.8.8.8
  nameserver[1] : 8.8.4.4
  if_index : 5 (en0)
  flags    : Scoped, Request A records
  reach    : Reachable
 
resolver #3
  search domain[0] : consul
  nameserver[0] : 10.85.5.248
  nameserver[1] : 10.85.6.138
  nameserver[2] : 10.85.7.189
  if_index : 12 (utun1)
  flags    : Scoped, Request A records
  reach    : Reachable

Native applications are able to successfully query these resolvers for .consul names:

$ ping consul.service.consul
PING consul.service.consul (10.85.5.248): 56 data bytes
64 bytes from 10.85.5.248: icmp_seq=0 ttl=63 time=150.995 ms
64 bytes from 10.85.5.248: icmp_seq=1 ttl=63 time=149.958 ms
64 bytes from 10.85.5.248: icmp_seq=2 ttl=63 time=150.317 ms
64 bytes from 10.85.5.248: icmp_seq=3 ttl=63 time=150.036 ms
^C

and

$ dscacheutil -q host -a name consul.service.consul
name: consul.service.consul
ip_address: 10.85.5.248
ip_address: 10.85.7.189
ip_address: 10.85.6.138

However, consul itself is unable to resolve such names (because it only relies on /etc/resolv.conf and not native macOs framework):

$ consul join consul.service.consul
Error joining address 'consul.service.consul': Unexpected response code: 500 (1 error(s) occurred:

* Failed to resolve consul.service.consul: lookup consul.service.consul on 8.8.8.8:53: no such host)
Failed to join any nodes.

Reproduction steps

Create custom nameservers using networksetup utility (networksetup -setdnsservers), verify using 'scutil --dns', verify successful resolution with dscacheutil (see example above), verify unsuccessful resolution via consul binary

@mbravorus
Copy link
Author

I am fairly positive this also relates to hashicorp/terraform#5925

and that if I read that PR discussion correctly, then terraform, as well as all other tools in hashicorp stack, are critically flawed when used on OS X machines, especially in light of Consul DNS interface

@slackpad
Copy link
Contributor

Hi @mbravorus thanks for opening an issue - we compile Consul with cgo disabled, so it's using the generic Go resolver instead of the OSX specific framework. We'd really like to avoid using cgo since Consul otherwise is pure Go, which vastly simplifies supporting all the platforms that we do with Consul. As a workaround, do you have some way of configuring the OSX DNS to delegate just .consul lookups to the Consul agent vs. having the Consul agent be the primary DNS resolver that recurses?

@mbravorus
Copy link
Author

Hi @slackpad, thanks for the response. I'm not entirely sure I understand why you'd want to delegate .consul lookups and how it would be a workaround for this case. The problem lies not in delegation or recursor addressing, but in the inability of "hashistack" tools themselves to use the native resolver on OS X. Therefore, no matter how I configure it, consul and terraform and others will remain blissfully unaware.

What can be done (and often is), is an attempt to duplicate the native DNS configuration via the older /etc/resolv.conf system, which your binaries understand. However, it is kludgy, and often gets overwritten by automatic tools and especially in VPN-heavy context, which is exactly the context the whole problem usually arises in.

Perhaps there is a way to offer a less/un-supported but "by the manufacturer" build with cgo for OS X users? That would be a functional workaround for those who are affected, without them having to reconstruct the whole build process with added unsupported element and no guidance. You, at the very least, know what you are doing :)

@pearkes pearkes added type/enhancement Proposed improvement or new feature dns labels Jul 24, 2018
@pearkes
Copy link
Contributor

pearkes commented Jul 24, 2018

Perhaps there is a way to offer a less/un-supported but "by the manufacturer" build with cgo for OS X users? That would be a functional workaround for those who are affected, without them having to reconstruct the whole build process with added unsupported element and no guidance. You, at the very least, know what you are doing :)

Unfortunately I don't think we plan to support this in the near-term. We'd welcome a guide from the community on how to do this, but can't support it at the moment. Thanks for suggesting it, however, and feel free to comment if I've missed something here.

@pearkes pearkes closed this as completed Jul 24, 2018
@flyinprogrammer
Copy link

flyinprogrammer commented Sep 25, 2018

Your friends who do nomad seem to have found a CD process that allows them to do this: https://github.com/hashicorp/nomad/blob/088f51a330a93186a74515b8d699f24e59611adf/GNUmakefile#L56

Also, this issue relates to this issue in core: golang/go#12524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants