Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent will not start on machines without a private ip since it won't bind to any ip available. #725

Closed
cetex opened this issue Feb 21, 2015 · 55 comments

Comments

@cetex
Copy link

cetex commented Feb 21, 2015

Consul agent won't start on our machines (that by default only have a public ip assigned, they are firewalled) since it won't bind to non-private ip's by default.

A commandline option to override the behaviour of "only bind to private ip's by default" would help a lot.
This option should change the current filters in consul for everything ip-related to allow any assigned ip to be used automatically.

@cetex cetex changed the title Agent will not start since it won't bind to any ip available. Agent will not start on machines without a private ip since it won't bind to any ip available. Feb 21, 2015
@cetex
Copy link
Author

cetex commented Feb 21, 2015

After investigating some more i realize it's not about bind-addresses, but advertise-addresses which almost makes this a bit silly. Consul runs and listens on any ip-address by default, but it refuses to announce any address that isn't rfc1918 by default? :)

I believe a proper solution is to add a "allow announce networks" option. By default it can continue to have the current behavior where it will only advertise rfc1918 addresses.

If this option is set it should override the default privateBlocks variable in util.go with the configured range, so the default rfc1918 addresses should not be set if this option is configured.
The option should be able to be specified multiple times, we may for example have multiple address-ranges we want to allow. It should also be IPv6 capable.

It would also be very nice if the -bind and the -client options could be specified multiple times, currently it only seems to bind to the last option specified on the commandline.

Some thoughts i have about service design:
A service shouldn't treat ip addresses differently if they are from one range or another as that's up to network design and network security. It may be something that can be configured, but no assumptions about security should be made by the service by default. Private IP's may be seen as more secure since they aren't routeable on the internet, but in reality they aren't.

For example: We may have rfc1918 addresses configured on some interfaces in our servers but in our case those are untrusted networks since they're connecting to something outside of our datacenter.
We only assign public IP's to our machines. Through rules in iptables on each host we configure the communication that we allow and drop the rest. Trough rules on our routers we make sure that we don't let anything weird into our network.

@armon
Copy link
Member

armon commented Feb 23, 2015

@cetex Consul does not bind or advertise to a public IP by default for security reasons. You can always provide the -bind or -client CLI flags to override it's behavior and force it to bind to a public address. As you've noted, it's impossible for us to understand the context of the network and security completely, but avoiding public addresses is a sane default in most cases.

As to multiple -client and -bind addresses, this is complicated significantly by the design of the gossip system, as it does not easily allow us to broadcast our availability on multiple addresses.

@cetex
Copy link
Author

cetex commented Feb 23, 2015

I have to correct myself from my last message, consul won't advertise a public IP by default, but if there is an IP to advertise it will listen on *.

The problem i have with the current policy is that if i specify -bind=:: (which should be the issue here if it's about security) Consul won't take any IP available and advertise it (as it does so nicely with RFC1918 ip's), it will still refuse to start even though it's allowed to listen on all IP's.

And frankly I don't see what the problem with non-RFC1918 addresses is. What actual problem is this filter solving in datacenters besides being annoying?
What person needs consul for their datacenter and this "protection"?

The current rules also has no regard at all for IPv6 environments since it assumes that things are running IPv4 only which should be seen as a relatively serious issue. (no IPv4 available? i'll quit), resulting in static being the only configuration-option for IPv6 environments. it would be a lot simpler if it could grab any assigned IP and use that by default.
(The Consul-client should resolve localhost and use the resolved value to connect to, that would make IPv4-only, IPv6-only and dualstack deployments work out of the box from the client)

Some tests on a server with IPv4 (public IP + 127.0.0.1), and IPv6 (link-local + routeable IPv6 + ::1) with a few interesting things:

Starting without -bind and -announce. As mentioned, i believe this should work out of the box for simplicity.
root@s16:~# consul agent -server -data-dir /tmp/consul
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found


Starting without -bind, -advertise set to 127.0.0.1 this is a bit unexpected with the RFC1918 rules currently in place. I expect consul to only bind to localhost and RFC1918 IP's with the current policy if -bind isn't specified.
root@s16:# consul agent -server -data-dir /tmp/consul -advertise=127.0.0.1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 's16'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:
....
root@s16:
# ss -antp | grep -i consul
LISTEN 0 128 127.0.0.1:8400 : users:(("consul",1108,17))
LISTEN 0 128 127.0.0.1:8500 : users:(("consul",1108,18))
LISTEN 0 128 127.0.0.1:8600 : users:(("consul",1108,22))
LISTEN 0 128 :::8300 :::* users:(("consul",1108,3))
LISTEN 0 128 :::8301 :::* users:(("consul",1108,12))
LISTEN 0 128 :::8302 :::* users:(("consul",1108,15))
Consul is listening on * for ports 8300, 8301 and 8302 even though it shouldn't, this is quite remarkable.


The last test got me curious, so let's add an RFC1918 IP to loopback and see what happens:
Starting without -bind or -advertise:
root@s16:# ip addr add dev lo 10.255.255.50/32
root@s16:
# consul agent -server -data-dir /tmp/consul
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 's16'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 10.255.255.50 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:
...
oot@s16:~# ss -antp | grep -i consul
LISTEN 0 128 127.0.0.1:8400 : users:(("consul",3568,17))
LISTEN 0 128 127.0.0.1:8500 : users:(("consul",3568,18))
LISTEN 0 128 127.0.0.1:8600 : users:(("consul",3568,22))
LISTEN 0 128 :::8300 :::* users:(("consul",3568,3))
LISTEN 0 128 :::8301 :::* users:(("consul",3568,12))
LISTEN 0 128 :::8302 :::* users:(("consul",3568,15))

It seems Consul is listening on all interfaces available anyways. If i didn't have iptables in place this consul server would be reachable over Internet. Quite unexpected with the current RFC1918 policy.


Let's try to advertise a hostname instead. This would allow my configuration to be almost as simple as not specifying -advertise at all while still being "protective" by default.
root@s16:~# consul agent -server -data-dir /tmp/consul -advertise=s16.r1.xx.xxx
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to parse advertise address: localhost


Starting with -bind=::
i'd expect this to start consul and bind to any IP (IPv4 or IPv6) available. Consul should pick any IP except the 127.0.0.1 and ::1 for advertising. This would be perfect for large scale deployments since it would "just work" in most cases.
root@s16:~# consul agent -server -data-dir /tmp/consul -bind=::
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start RPC layer: RPC advertise address is not advertisable: [::]:8300


Starting with -bind=xx.xx.45.83 works, This requires a bit of scripting to implement since all slaves (a few hundred) are pxe-booted and use the same boot image. This doesn't work at all if i want consul to listen to more than one IP (for example for anycasting which we do) since i can't state multiple -client or -bind options. I would have to run netcat on the other IP's or something equally dirty (iptables dnat) to make it work which is quite unacceptable in a production environment.
root@s16:# consul agent -server -data-dir /tmp/consul -bind=xx.xx.45.83
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 's16'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: xx.xx.45.83 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:
...
root@s16:
# ss -antp | grep -i consul
LISTEN 0 128 xx.xx.45.83:8300 : users:(("consul",1381,3))
LISTEN 0 128 xx.xx.45.83:8301 : users:(("consul",1381,12))
LISTEN 0 128 xx.xx.45.83:8302 : users:(("consul",1381,15))
LISTEN 0 128 127.0.0.1:8400 : users:(("consul",1381,17))
LISTEN 0 128 127.0.0.1:8500 : users:(("consul",1381,18))
LISTEN 0 128 127.0.0.1:8600 : users:(("consul",1381,22))


Starting with -bind=:: -advertise=2001:db8::1 works but with the same drawbacks as for the IPv4 advertising, Consul should instead grab the primary address of any non-loopback interface and announce it.
root@s16:# consul agent -server -data-dir /tmp/consul -bind=:: -advertise=2001:db8::1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 's16'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 2001:db8::1 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:
...
root@s16:
# ss -antp | grep -i consul
LISTEN 0 128 127.0.0.1:8400 : users:(("consul",2581,17))
LISTEN 0 128 127.0.0.1:8500 : users:(("consul",2581,18))
LISTEN 0 128 127.0.0.1:8600 : users:(("consul",2581,22))
LISTEN 0 128 :::8300 :::* users:(("consul",2581,3))
LISTEN 0 128 :::8301 :::* users:(("consul",2581,12))
LISTEN 0 128 :::8302 :::* users:(("consul",2581,15))

@armon
Copy link
Member

armon commented Mar 3, 2015

@cetex Thank you for testing all of the combinations, there are definitely some bugs which you've helped to find with selecting the bind address in the various sub-systems.

In terms of defaults, Consul will only bind to a private IPv4 address by default. The reason for this is that almost no deployments of Consul should be on the public internet, and many environments still aren't great for IPv6.

For users that want to bind to a public address they must do so explicitly so that it is clear to them that the node will be on the public internet and that extra security measures should be taken. IPv6 is also not done automatically due to the number of environments where it does not work properly.

I think this is a sane tradeoff. In the majority of cases, we can automatically bind to the private IPv4 address. In any other case, an explicit bind value can be provided to override the behavior.

Eventually, it would be nice to support interface binding as well, but it is not a high priority ticket.

@telmich
Copy link

telmich commented Mar 16, 2015

@armon I just want to mention a use case that may not be entirely clear here: We at ungleich are running an infrastructure for customers that consists solely of machines with public ip addresses and safe firewall settings - they are all managed by configuration management (cdist in our case).

At the moment we have to include the individual public ip address in the configuration, which kind of reverses the purpose of consul (which is able to discover things, but before discovering, we need to manually discover the node's ip address).

In short: we do have a use case in which we want consul to bind and announce on any address. I would expect consul to do so by issuing -bind=0.0.0.0, however that results in the infamous error message also seen in #789.

If your point of view is that this is insecure by default, then I suggest to add a flag to consul like -allow-insecure-bind or -allow-non-private-bind, but still allow to bind without specifying the public ip address.

@armon
Copy link
Member

armon commented Mar 17, 2015

@telmich So in this case, is it not possible to just query the IP address of the device via ipconfig and provide that to Consul via -bind? The issue with "0.0.0.0" is that it's not actually a routable address, we cannot gossip that to our peers. So when you give that address to Consul, it looks for a private IPv4 address by default, and that is the address it uses to announce to peers.

This is why the -announce and -bind flags are there as an escape hatch. They allow you as an operator to intervene where it's either ambiguous or unsafe by default. What Consul cannot currently do is advertise multiple IP addresses for the same node. This would require a great many changes to how the catalog and gossip systems work to support.

I hope that helps!

@telmich
Copy link

telmich commented Mar 19, 2015

@armon I understand the problem in regard to consul now - I wasn't aware that the IP is not taken from the IP packet itself, but is contained as data.

I will try to find a way around it by writing cdist types that query for the address of ethX.

I wonder, would it be a smaller change to consul to support something like -bind-interface to select the ip to be used? Almost all of our machines are running on opennebula and thus the interface is eth0 in most cases, so being able to use -bind-interface eth0 would at least make the situation easier for the moment.

I would however appreciate support for "no configuration required" in case of having only official ip addresses mid term - either way (multiple ip support, using the source ip in the ip packet, ...)

@cetex
Copy link
Author

cetex commented Mar 19, 2015

The problem is with "--announce", consul refuses to announce anything that's not a rfc1918 address.
It will still bind to any address, even if it's public.

I believe this gives a false sense of security (it's actually pretty close to "security by obscurity") since if you're aware that consul won't do anything with public ip's you expect that you could run it on a host with both public and private interfaces and that consul will only listen to the private interfaces, but this is not true.

this "security-feature" in consul is only about announcing, so it won't pick a public ip to announce to it's peers, but it will still happily bind to any ip. so if i install consul on a host that has a public and a private ip, consul will start and announce the private ip to it's peers, but it will still bind to both the private and the public IP, and therefore will be accessible on the public internet unless i have some other protection.

I see a few options as solutions:
1: Remove the ip check entirely since it's implementation is highly confusing.
2: Fix the current implementation so consul won't bind to a public ip unless specifically told to.
3: Update documentation so it's clear that it's only about the ip consul will announce to it's peers, it should be made clear that it doesn't provide any extra security.
4: Do nr 2 and add a command-line option to override the networks consul will be allowed to announce and bind to, (I'd prefer this. over 1 and 3 since this simplifies deploys a lot)
5: Add an option so we can specify what interface consul should bind and announce to, this needs to disable the ip check in consul if it's not going to be confusing.

@telmich
Copy link

telmich commented Mar 19, 2015

@cetex I agree in the regard that using rfc1918 ip addresses for security is flawed by design.

I would personally favor fix no. 1, as it feels very strange in 2015 to rely on rfc1918 ip addresses (may have "felt" more sensible in ~1996).

I would also suggest another solution that could make life very easy for developers of consul as of users:

  1. Remove check for private ip address
  2. Check if list of available ip addresses minus 127.0.0.1 equals 1
    2.1. if yes: bind to it and announce that ip address
    2.2. if no: error out, because it is ambiguous, which IP to announce

That would fix probably 99% of the cases and if you are having multiple IP addresses on a host, you might be doing something "special" and require adjusting.

@cetex
Copy link
Author

cetex commented Mar 19, 2015

@telmich I agree. that would be a very sane design since it removes any possibility that consul may select the wrong address to announce "automagically"

I'm also a bit torn between keeping the ip filter or not, removing it completely would be the most sane thing to do, but keeping it around (changing the default filter to "::" so it's not filtering by default) and then making it configurable with an option like "-limit-announce-ip" which would override the default of "::" would simplify deployment in our case.

For example:
We want consul to listen on * and we have one or a few known ranges of ip's we want to use for announcement in consul. this ip range is guaranteed to give 1 unique ip per host.

-bind=:: -limit-announce-ip=a.a.a.a/b -limit-announce-ip=c.c.c.c/d would make this work for us.
-limit-announce-ip would make consul only pick ip's for announcing from the specified range and we could configure it quite static in our deployments. Allowing this to be specified multiple times would be neccesary.

But i guess i'd prefer software that doesn't care about what ip's are used at all so my "vote" is for complete removal of the feature.

@johnjelinek
Copy link

Why isn't this working?

PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -client 127.0.0.1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -advertise 104.167.111.185
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -bind 104.167.111.185
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -advertise 127.0.0.1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -bind 127.0.0.1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -client 127.0.0.1
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -bind ::
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -announce 104.167.111.185
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found
PS C:\Users\Administrator\Documents> consul agent -data-dir .\consul -bind 104.167.111.185
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: No private IP address found```

@ryanbreen
Copy link
Contributor

What is the output of ipconfig? It just looks like that machine has no private ip.

@ryanuber
Copy link
Member

@johnjelinek that seems odd, specifying an explicit bind address should bypass the code that performs the private IP check. Is there anything special about the environment? What release of Consul are you using?

@johnjelinek
Copy link

Using 0.5.1 on Windows Server 2012 R2.


PS C:\Users\Administrator\Documents> ipconfig




Windows IP Configuration







Ethernet adapter Network 1:




   Connection-specific DNS Suffix  . :

   Link-local IPv6 Address . . . . . : fe80::249a:a6ec:6d4e:1b4a%12

   IPv4 Address. . . . . . . . . . . : 104.167.111.185

   Subnet Mask . . . . . . . . . . . : 255.255.255.0

   Default Gateway . . . . . . . . . : 104.167.111.1




Tunnel adapter isatap.{260E021B-7C03-406B-9A6B-2E3027226409}:




   Media State . . . . . . . . . . . : Media disconnected

   Connection-specific DNS Suffix  . :




Tunnel adapter 6TO4 Adapter:




   Connection-specific DNS Suffix  . :

   IPv6 Address. . . . . . . . . . . : 2002:68a7:6fb9::68a7:6fb9

   Default Gateway . . . . . . . . . : 2002:c058:6301::1

                                       2002:c058:6301::c058:6301




Tunnel adapter Teredo Tunneling Pseudo-Interface:




   Connection-specific DNS Suffix  . :

   IPv6 Address. . . . . . . . . . . : 2001:0:9d38:90d7:865:1df0:9758:9046

   Link-local IPv6 Address . . . . . : fe80::865:1df0:9758:9046%16

   Default Gateway . . . . . . . . . :


Sent from Mailbox

On Wed, May 13, 2015 at 11:16 PM, Ryan Uber notifications@github.com
wrote:

@johnjelinek that seems odd, specifying an explicit bind address should bypass the code that performs the private IP check. Is there anything special about the environment? What release of Consul are you using?

Reply to this email directly or view it on GitHub:
#725 (comment)

@ryanbreen
Copy link
Contributor

Right, 104.167.111.185 is a public IP. Consul requires a private address on which to advertise.

@johnjelinek
Copy link

But I should be and to bypass it with bind, right? If not, how can I create a private IP without a second network adapter in windows? I was thinking maybe I'd have to create a VPN of all my VPSes. This problem doesn't exist on my Linux servers because consul is running in a docker container so those servers get the private container IP.


Sent from Mailbox

On Thu, May 14, 2015 at 8:28 AM, Ryan Breen notifications@github.com
wrote:

Right, 104.167.111.185 is a public IP. Consul requires a private address on which to advertise.

Reply to this email directly or view it on GitHub:
#725 (comment)

@ryanbreen
Copy link
Contributor

You can't bind to an IP that isn't present in ipconfig. There needs to be a network adapter receiving packets at that IP for Consul to work. I believe Windows has the ability for you to create a virtual network adapter sharing the same physical adapter, but keep in mind that all machines in your cluster will need to be on the same private network with non-overlapping IPs so that they can route packets to each other.

@johnjelinek
Copy link

My docker cluster is exposing on the public IPs without any issue. These are ephemeral servers that don't have a long life, so they should be coming in and out of the cluster pretty regularly. I'll try to add a virtual interface for windows to see if that helps. Didn't you say bind should override though?


Sent from Mailbox

On Thu, May 14, 2015 at 8:49 AM, Ryan Breen notifications@github.com
wrote:

You can't bind to an IP that isn't present in ipconfig. There needs to be a network adapter receiving packets at that IP for Consul to work. I believe Windows has the ability for you to create a virtual network adapter sharing the same physical adapter, but keep in mind that all machines in your cluster will need to be on the same private network with non-overlapping IPs so that they can route packets to each other.

Reply to this email directly or view it on GitHub:
#725 (comment)

@ryanbreen
Copy link
Contributor

bind is useful in cases where a machine has multiple private IPs and Consul's default behavior doesn't choose the one you want. It can't create entirely new interfaces for you.

I'm curious why your ephemeral instances need public IPs. Feels like that's a case where private IPs load-balanced and exposed as a single public IP makes more sense.

@johnjelinek
Copy link

That's all my VPS provider provides. 1 public IP and no private interface. I got a little further by creating a virtual loopback nic in windows and setting a static IP on the interface to 10.0.0.1. Now consul keeps moving. It seems like there's no config setting to just bypass the private IP requirement.


Sent from Mailbox

On Thu, May 14, 2015 at 9:28 AM, Ryan Breen notifications@github.com
wrote:

bind is useful in cases where a machine has multiple private IPs and Consul's default behavior doesn't choose the one you want. It can't create entirely new interfaces for you.

I'm curious why your ephemeral instances need public IPs. Feels like that's a case where private IPs load-balanced and exposed as a single public IP makes more sense.

Reply to this email directly or view it on GitHub:
#725 (comment)

@johnjelinek
Copy link

I also learned that the order of the flags matter: -bind public_ip -join cluster_IP

For future reference, create a virtual loopback interface in windows and see it to a private address to bypass consul's check. Then consul agent -bind ... -join ...


Sent from Mailbox

On Thu, May 14, 2015 at 9:28 AM, Ryan Breen notifications@github.com
wrote:

bind is useful in cases where a machine has multiple private IPs and Consul's default behavior doesn't choose the one you want. It can't create entirely new interfaces for you.

I'm curious why your ephemeral instances need public IPs. Feels like that's a case where private IPs load-balanced and exposed as a single public IP makes more sense.

Reply to this email directly or view it on GitHub:
#725 (comment)

@duritong
Copy link

Just to summarize the lengthy discussion:

If you have a node without any private ip addresses you are - at the moment - not able to use consul on this node.

There is no way around the requirement to advertise a private IPv4.

This needs to be fixed. The private ip space restriction does not hold within the IPv6 world, so why is it present within the IPv4 space?

@armon
Copy link
Member

armon commented Jun 17, 2015

@duritong Not quite, you are able to use Consul but you just have to specify the bind address, Consul will not automatically infer it. So you must provide a single configuration option, certainly not an undue burden.

@johnjelinek
Copy link

And you also need to create a loopback interface. The burden exists, but it's not significant.


Sent from Mailbox

On Wed, Jun 17, 2015 at 2:08 PM, Armon Dadgar notifications@github.com
wrote:

@duritong Not quite, you are able to use Consul but you just have to specify the bind address, Consul will not automatically infer it. So you must provide a single configuration option, certainly not an undue burden.

Reply to this email directly or view it on GitHub:
#725 (comment)

@highlyunavailable
Copy link
Contributor

The only reason you need the loopback is to bind the client ports, so technically you could use -client flag to bind to the same IP as the bind IP. This exposes the HTTP API to that IP though, which could be a security issue if it is exposed to the world.

@duritong
Copy link

Right, I'm sorry. I was confused by the advertise discussion and a config-path error on my side let me think that you also need to advertise 127.0.0.1 which is kind of counterintertuitive based on what the documentation says about that option.

Defining the -bind option is sufficient to get consul running on a purely public ipv4 node.

Sorry for the noise.

@therealbill
Copy link

"By default, we will scan for a private IP and bind to that if available. Otherwise, we return an error. "

So why not instead bind to whatever IP is found, rather than returning an error? Combine that with the suggestions made by @telmich and perhaps both "sides" can have their cake an eat it to. After all, if a host only has a single IP it is pretty obvious the operator is wanting Consul to bind to it, and pretty obvious it is expected to.

Saying "we won't bind to a non-rfc1918 address even if there are none found is telling the user you're smarter than they are. Making them jump through hoops to do what daemons are expected to do should have a very high bar, and I don't see that bar as being met here.

@withinboredom
Copy link

To make matters more frustrating is that it binds to the first available rfc1918 address, even though it was told to join to another subnet... Grumble grumble...

@TiS
Copy link

TiS commented Feb 24, 2016

@telmich - +1 for selecting default interface. Current setup makes it different to run consul automatically on virtual hosts - each one will have different IP, so one has to write a shell script, guess the IP, append to configuration... If we could select an interface - it all gets a whole load easier. I'd tell consul "advertise on interface eth0" and that's it.

@OferE
Copy link

OferE commented Mar 6, 2016

+1 for @TiS suggestion. You can't spawn cloud machines automatically, with docker installation without this feature. One have to write a shell script to bypass.

@therealbill
Copy link

I'll add my two cents on the "bind to interface" option. I'd support it and from a brief glance through the net package the needed bits to do it (net.Interfaces(), net.InterfaceByName(), interface.Addrs(), etc.) are there. In the case of an interface having multiple addresses, I'd propose the first one to be considered the default.

@withinboredom
Copy link

@therealbill The downside with choosing the "first address" is that in the case of a docker network overlay, the first ip will likely be the wrong one. It would be better if it would bind to all ip's on a given interface, if you don't want it to bind to all of them then the ip should be specified. Perhaps something like:

--bind eth0 to bind to all ip's on a given interface

--bind eth0;10.0.0.0/16 bind to any ip on a given interface that matches the given cidr.

--bind eth0;10.0.0.5 bind to a specific ip on a given interface

--bind 10.0.0.5 bind to a specific ip regardless of interface

--bind 10.0.0.0/16 bind to a given ip that matches a cidr regardless of interface

@therealbill
Copy link

@withinboredom I'd be ok with that.

@beornf
Copy link

beornf commented Mar 8, 2016

I've submitted a pull request which binds to a cidr regardless of interface #1570. This could be expanded to support interface. In the PR to ease parsing --bind 10.0.0.0 matches the 10.0.0.0/16 subnet and returns the first ip in that subnet.

@orivej
Copy link

orivej commented Mar 8, 2016

@beornf, individual addresses can end with .0, it is not a certain indication of a subnet mask. @withinboredom's proposal is better.

@supengfei
Copy link

get available ip list fail get github first setup ips :none

@arnedag
Copy link

arnedag commented Aug 13, 2016

While I think it' a bit odd to consider rfc1918-addresses as special, it seems to me that the the problem is even worse than that: we use internally routable networks in 10.0.0.0/8, and I get the same error on a machine configured just with localhost and an ip in 10.x.y.z/24, which of course is a valid rfc1918-address.

This with Consul v0.6.4 on SmartOS.

@Globik
Copy link

Globik commented Aug 17, 2016

Why WebRTC does not work without internet connectivity?

@withinboredom
Copy link

Is it plugged in and powered on?

@Globik
Copy link

Globik commented Aug 17, 2016

Not plugin. Just os windows with google chrome browser on board. No webrtc sample does not work without internet. It's a pitty.

@jyoti-264
Copy link

Hi,

I am getting following error message:

[root@VORA1 bin]# ./consul agent -server -data-dir=/var/local/vora-discovery
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start RPC layer: listen tcp 0.0.0.0:8300: bind: address already in use

Result of ifconfig:

[root@VORA1 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr D8:CB:8A:91:7F:98
inet addr:10.78.1.240 Bcast:10.78.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1932329 errors:0 dropped:0 overruns:0 frame:0
TX packets:775949 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:486373768 (463.8 MiB) TX bytes:137580640 (131.2 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:2381482 errors:0 dropped:0 overruns:0 frame:0
TX packets:2381482 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:575541355 (548.8 MiB) TX bytes:575541355 (548.8 MiB)

-bind also doesn't seem to resolve the issue:

[root@VORA1 bin]# ./consul agent -server -data-dir=/var/local/vora-discovery --bind 10.78.1.240
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start RPC layer: listen tcp 10.78.1.240:8300: bind: address already in use

[root@VORA1 bin]# ss -antp | grep -i consul
LISTEN 0 128 10.78.1.240:8300 : users:(("consul",6927,5))
LISTEN 0 128 10.78.1.240:8301 : users:(("consul",6927,9))
LISTEN 0 128 10.78.1.240:8302 : users:(("consul",6927,12))
LISTEN 0 128 :::8400 :::* users:(("consul",6927,14))
LISTEN 0 128 :::8500 :::* users:(("consul",6927,15))
LISTEN 0 128 :::8600 :::* users:(("consul",6927,16))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42892 users:(("consul",6927,22))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42862 users:(("consul",6927,18))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42922 users:(("consul",6927,25))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42910 users:(("consul",6927,23))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42964 users:(("consul",6927,28))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42912 users:(("consul",6927,24))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42932 users:(("consul",6927,26))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42872 users:(("consul",6927,20))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42888 users:(("consul",6927,21))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42946 users:(("consul",6927,27))
ESTAB 0 0 ::ffff:127.0.0.1:8500 ::ffff:127.0.0.1:42866 users:(("consul",6927,19))

Any pointers please...

Thanks

@Globik
Copy link

Globik commented Sep 9, 2016

@jyoti-264 , your port is already in use. Try to provide another port.

@jbiel
Copy link

jbiel commented Nov 2, 2016

+1 for the ability to bind to an interface. I think the RFC1918 restriction is a bit unexpected. I'd auto-bind to the non-localhost (eth0) interface IP address if only one is available and force the user to set an address in cases where more than one is available.

We use an overlay network for our containers and the subnet for this overlay network is 250.0.0.0/8 (marked for "future use" by IANA.) Probably not best practice, but the overlay requires a /8 and we're already using 172/8 for our VPCs and 10/8 for EC2 classic. The addresses in the overlay network are assigned from DHCP at the time containers are launched so scripting up a consul container to do a bind is...fun.

Update: nevermind. I found that the Consul container entrypoint has a variable (CONSUL_BIND_INTERFACE) that's used for exactly this purpose and all is good now. :)

This was referenced Nov 30, 2016
@sean-
Copy link
Contributor

sean- commented Dec 2, 2016

All, please give the latest code in master a twirl and let us know. The syntax for supporting IP addresses, getting the first "usable" IP address on an interface, or any other manner of network craziness is likely possible now as a template evaluated address parameter can be passed to Consul addresses (e.g. -bind or bind_addr, or any other *_addr-like parameter):

-bind='{{ GetInterfaceIP "eth0" }}'
-bind='{{ GetAllInterfaces | include "network" "10.99.0.0/24" }}'
-bind='{{ GetDefaultInterfaces | include "network" "10.99.0.0/24" | sort "size,address" | attr "address" }}'
-bind='{{ GetAllInterfaces | exclude "rfc" "6890" | sort "type,size,address" | include "flags" "up|forwardable" | attr "address" }}'

Very few people should need to do anything as obscene as shown in the last example, but the functionality is there should you need it.

With the sockaddr command you can experiment with getting the right template syntax with the eval sub-command, for instance:

$ sockaddr eval 'GetAllInterfaces | include "network" "10.99.0.0/24" | sort "size,address" | attr "address"'
10.99.0.5
$ sockaddr eval 'GetInterfaceIP "eth0"'
10.99.0.5

There is now a configurable template language for examples and docs) behind this that you can use to create a customizable heuristic that should allow you to get whatever it is that you need from your environment when using an immutable image (see hashicorp/go-sockaddr/template and cmd/sockaddr. Feedback welcome (preferably as a new issue, however).

@sean- sean- closed this as completed Dec 2, 2016
@telmich
Copy link

telmich commented Dec 2, 2016

@sean- that looks awesome - will give it a try next week!

@jskarpe
Copy link

jskarpe commented Feb 27, 2017

How to support servers behind DHCP with changing IPs? E.G bind literally to 0.0.0.0?

@sean-
Copy link
Contributor

sean- commented Mar 6, 2017

@yuav you have to restart Consul, but if you use -bind={{GetPrivateIP}} and restart the process when it changes IPs you'll be g2g now that #2786 has been merged.

@ailjushkin
Copy link

I just started with default parameters and all started fine. I was need to remove the -client option at all.

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
The port-forward sometimes fails randomly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests