Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sockaddr templates in start-join and retry-join configuration #3865

Conversation

dominik-lekse
Copy link
Contributor

This pull request adds support for sockaddr templates in the start-join and retry-join configuration fields. This allows to resolve addresses to a join an existing cluster based on the network configuration of the consul agent host.

Use case example

This use case describes a scenario in which the feature is essential.

  • A consul cluster should be established in a /24 network which contains all hosts of the cluster
  • Different environments for this consul cluster have different network prefixes (e.g. the test cluster has the subnet 10.0.1.0/24 and the production cluster has the subnet 10.0.2.0/24)
  • In a subnet, three host machines which run consul servers exist with the addresses .1, .2, and .3 relative to the subnet (e.g. the first consul server in the test environment has the address 10.0.1.1/24)
  • A variable number of worker machines exist in an environment and run consul in client mode
  • Instances of worker machines are spawned depending on the application load from an worker machine image which is equal in all environments

Using the following configuration, the consul agents on the worker machines are able to find the consul server machines relative to the subnet they are deployed in.

$ consul agent -retry-join '{{ GetAllInterfaces | include "type" "IPv4" | exclude "flags" "loopback" | sort "address" | math "network" "+1" | attr "address" }}'
{
  "retry_join": [
    "{{ GetAllInterfaces | include \"type\" \"IPv4\" | exclude \"flags\" \"loopback\" | sort \"address\" | math \"network\" \"+1\" | attr \"address\" }}",
    "{{ GetAllInterfaces | include \"type\" \"IPv4\" | exclude \"flags\" \"loopback\" | sort \"address\" | math \"network\" \"+2\" | attr \"address\" }}",
    "{{ GetAllInterfaces | include \"type\" \"IPv4\" | exclude \"flags\" \"loopback\" | sort \"address\" | math \"network\" \"+3\" | attr \"address\" }}"
  ]
}

Reference

Testing

As part of this pull request, I did not include tests to cover this functionality. Any hints on how and where to implement the corresponding tests are appreciated.

Open tasks

  • Implement tests
  • Write documentation

@preetapan
Copy link
Contributor

@dominik-lekse We discussed this PR internally. This does add a bit more complexity and in more recent versions of consul we have moved towards using go-discover for retry-join to have provider level implementations for discovering other agents to join. Having three different ways to provide the IP (go-discover, IP address, and go-socketaddress template) adds a lot of configuration and complexity overhead so we want to make sure this is justified.

As for your use case, it seems like you could have config management figure out one server's IP given the subnet and set that in your Consul config file, so does it really need to have that six steps of piped go-discover code to figure out what server to join to?

@pearkes
Copy link
Contributor

pearkes commented Apr 20, 2018

Given we haven't heard anything based on our suggestions/questions above I'm going to close this, but I encourage you to comment and we can re-open it if you want to pick this up again.

Alternatively, if things have changed dramatically, feel free to create a new PR. You may also want to follow the mDNS PR for go-discover where some related discussion is happening: hashicorp/go-discover#28.

@pearkes pearkes closed this Apr 20, 2018
@sean-
Copy link
Contributor

sean- commented Apr 20, 2018

As for your use case, it seems like you could have config management figure out one server's IP given the subnet and set that in your Consul config file

Configuration management is frequently not used and not able to be used in immutable environments. This PR is clean, albeit the syntax used to do the IP math is sub-optimal and tests are missing (docs remain elusive for understandable usability reasons). There are two ways to obtain IP information: go-discover or a statically defined string that may be interpolated and used to calculate an IP using an organization-specific heuristic. Use of instance tags when they are available is convenient, but that isn't always the case in the enterprise.

It is possible to use sockaddr(1) to preprocess the config file, notably using something like gomplate(1), which now has sockaddr/template support. Networking is messy, especially when trying to join consul servers that span subnets or broadcast domains. It's good to have options and tools available to solve problems.

@dominik-lekse
Copy link
Contributor Author

Apologies for not being fast in responding here.

I agree that this PR adds complexity to the retry join in the ways to derive the IPs. However, in my opinion it is justified due to the following reasons.

As @sean- already pointed out, configuration management is not the approach used in immutable environments. In our case, we bake immutable OS images which are deployed across multiple environments. However, these environments differ in their network prefixes as outline in the case above. This feature allows us to start consul and join the cluster within each environment while using the same image.

Further, go-sockaddr templates are supported by various other consul configuration parameters which expect an IP such as client or addresses. When I first tried consul in a prototype for my use case, I was happy about go-sockaddr simplicity and expected the templates to be available consistently in all IP related configuration parameters. It took a while to realize that the retry-join fields are not among these.

With my team, we discussed also discussed the use of go-discover. The conclusion is, that using go-discover and in particular its Azure provider would add a dependency to the Azure API and requires setting up dedicated credentials/service principals to query this API. In our case, we would need to bake these credentials into the image which is worse in terms of security than the go-sockaddr template solution. Further, it increases the failure surface in case the Cloud provider API is not available.

An alternative, I can imaging to implement a "go-sockaddr provider" as part of the go-discover library. On the other hand, I am not sure if that really fits into the functional scope of go-discover.

Regarding the six steps of piped go-discover code, I think this is complexity which independent of this pull request. We could implement additional shortcuts in go-sockaddr to reduce the necessary steps for a use case like this.

@pearkes
Copy link
Contributor

pearkes commented Apr 26, 2018

Thanks for the thoughts @dominik-lekse @sean-. Will open this back up so we don't lose track of it and continue the discussion.

@pearkes pearkes reopened this Apr 26, 2018
@pearkes pearkes added this to the 1.1.0 milestone Apr 27, 2018
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

There is a trivial naming inconsistency – retryJoinLan should be retryJoinLAN to be consistent – but I'll fix that up.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

Sorry @dominik-lekse, I failed to notice that there are no tests cases covering this new code until I started looking to see if my name changes broke them.

It should be trivial to add them to the existing table test for edge cases just like we do for the other sockaddr params:

{
desc: "client, address template and ports",
args: []string{`-data-dir=` + dataDir},
json: []string{`{
"client_addr": "{{ printf \"1.2.3.4 2001:db8::1\" }}",
"addresses": {
"dns": "{{ printf \"1.1.1.1 2001:db8::10 \" }}",
"http": "{{ printf \"2.2.2.2 unix://http 2001:db8::20 \" }}",
"https": "{{ printf \"3.3.3.3 unix://https 2001:db8::30 \" }}"
},
"ports":{ "dns":1, "http":2, "https":3 }
}`},
hcl: []string{`
client_addr = "{{ printf \"1.2.3.4 2001:db8::1\" }}"
addresses = {
dns = "{{ printf \"1.1.1.1 2001:db8::10 \" }}"
http = "{{ printf \"2.2.2.2 unix://http 2001:db8::20 \" }}"
https = "{{ printf \"3.3.3.3 unix://https 2001:db8::30 \" }}"
}
ports { dns = 1 http = 2 https = 3 }
`},
patch: func(rt *RuntimeConfig) {
rt.ClientAddrs = []*net.IPAddr{ipAddr("1.2.3.4"), ipAddr("2001:db8::1")}
rt.DNSPort = 1
rt.DNSAddrs = []net.Addr{tcpAddr("1.1.1.1:1"), tcpAddr("[2001:db8::10]:1"), udpAddr("1.1.1.1:1"), udpAddr("[2001:db8::10]:1")}
rt.HTTPPort = 2
rt.HTTPAddrs = []net.Addr{tcpAddr("2.2.2.2:2"), unixAddr("unix://http"), tcpAddr("[2001:db8::20]:2")}
rt.HTTPSPort = 3
rt.HTTPSAddrs = []net.Addr{tcpAddr("3.3.3.3:3"), unixAddr("unix://https"), tcpAddr("[2001:db8::30]:3")}
rt.DataDir = dataDir
},
},

@banks
Copy link
Member

banks commented May 10, 2018

@dominik-lekse thanks for this. I took a stab at this as we are looking to get a release out and ended up with a new PR based on this #4102. Any comments welcome there.

Thank's for the contribution!

@banks banks closed this May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants