-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hostname not populated in /etc/hosts for Docker tasks with Connect #8900
Comments
For those having this issue, current hackfix solution is: config {
args = ["/tmp/hosts-entrypoint.sh"]
entrypoint = ["/bin/bash"]
volumes = [
"local/hosts-entrypoint.sh:/tmp/hosts-entrypoint.sh",
]
image = "some-debian-else-based-image"
}
template {
data = <<EOF
#!/bin/bash
echo $(hostname -I) $HOSTNAME >> /etc/hosts && /entrypoint.sh
EOF
destination = "local/hosts-entrypoint.sh"
} |
Linking also to: #7746 |
Meanwhile Note its just a sample, and is levant templated job "[[.JOB]]" {
datacenters = ["us-east-1a", "us-east-1b", "us-east-1c"]
type = "service"
reschedule {
delay = "15s"
delay_function = "exponential"
max_delay = "15m"
attempts = 10
interval = "2h"
unlimited = false
}
group "[[.JOB]]-database-group" {
network {
mode = "bridge"
}
service {
name = "[[.JOB]]-database"
port = "5432"
connect {
sidecar_service {}
}
}
task "[[.JOB]]-database" {
driver = "docker"
env {
ALLOW_IP_RANGE = "0.0.0.0/0"
}
config {
image = "postgres:11.5"
advertise_ipv6_address = true
}
}
}
group "[[.JOB]]-service-test" {
network {
mode = "bridge"
}
service {
name = "[[.JOB]]-service-test"
port = "5432"
connect {
sidecar_service {
tags = ["traefik.enable=false"]
proxy {
upstreams {
destination_name = "[[.JOB]]-database"
local_bind_port = 5433
}
}
}
}
}
task "[[.JOB]]-service-test" {
driver = "docker"
config {
image = "ubuntu"
entrypoint = ["/bin/sh"]
args = ["-ec", "sleep 1000"]
volumes = [
"../alloc/tmp/hosts:/etc/hosts",
]
}
}
task "[[.JOB]]-service-test-config" {
lifecycle {
sidecar = false
hook = "prestart"
}
driver = "docker"
config {
image = "ubuntu"
entrypoint = ["/bin/bash"]
args = [
"/tmp/hosts-entrypoint.sh"
]
volumes = [
"local/hosts-entrypoint.sh:/tmp/hosts-entrypoint.sh",
]
}
template {
data = <<EOF
#!/bin/bash
cat /etc/hosts > /alloc/tmp/hosts
echo $(hostname -I) $HOSTNAME >> /alloc/tmp/hosts
EOF
destination = "local/hosts-entrypoint.sh"
}
}
}
} |
Confirmed that Nomad's group networking in bridge mode diverges from Docker's bridge mode networking. Not sure exactly what is going on here, so we'll have to investigate! Hopefully it's a bug and not somewhere we have to diverge from Docker for some reason. If that's the case we'll figure out where to document the difference and offer a workaround. Output for Nomad bridge networking using these job files:
Output with Docker's network_mode=bridge:
(Behavior is the same on Nomad 0.12.8 and Nomad 1.0.4) |
Cross-linking #8343 which may have a related underlying cause. |
I spent a little time on this and have a root cause. When If the # docker inspect 9b4 | jq '.[0].HostConfig.NetworkMode'
"container:8c6b40e9132f55af6ab1ea32d372bd86ba7161b96898008ec7e565bd9c12582e"
# docker inspect 8c6 | jq '.[0].HostConfig.NetworkMode'
"none" It looks like when Docker sets up the # docker run -it --rm --network=none debian:buster cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters So when we have the a network mode of # cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# hostname -i
hostname: Name or service not known
# hostname -f
hostname: Name or service not known
# hostname -I
172.26.64.85 Whereas with # docker inspect 5c0 | jq '.[0].HostConfig.NetworkMode'
"bridge"
# docker inspect 885 | jq '.[0].HostConfig.NetworkMode'
"none" And the # cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 ca2524bf6b59
# hostname -i
172.17.0.3
# hostname -f
ca2524bf6b59 I'm at the end of my week here so I don't have a solid solution yet. But one option that jumps out at me immediately is that we could try to detect this scenario and inject an |
A couple of approaches that I've discovered won't work:
But I think we're on the right track here in terms of overriding the As an aside I've confirmed the behavior reported where Connect won't work when the jobspecjob "example" {
datacenters = ["dc1"]
group "server" {
network {
mode = "bridge"
}
service {
name = "www"
port = "8001"
connect {
sidecar_service {}
}
}
task "task" {
driver = "docker"
config {
image = "0x74696d/dnstools"
command = "busybox"
args = ["httpd", "-v", "-f", "-p", "8001", "-h", "/srv"]
network_mode = "bridge"
}
}
}
group "client" {
network {
mode = "bridge"
}
service {
name = "client"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "www"
local_bind_port = 8080
}
}
}
}
}
task "task" {
driver = "docker"
config {
image = "0x74696d/dnstools"
command = "/bin/sh"
args = ["-c", "sleep 5; while true; do curl -v \"http://${NOMAD_UPSTREAM_ADDR_www}\" ; sleep 10; done"]
network_mode = "bridge"
}
}
}
} That's because the envoy proxy isn't in the same network spacespace as the application. That's expected, but I don't think it's very well documented and it would be nice if we could provide some warnings for the user if they're configuring jobs this way.
|
Working a bit more on this I wanted to make sure we're trying to get the right behavior here. I added the hack-around that @Ilhicas suggested (note however that I call jobspecjob "example" {
datacenters = ["dc1"]
group "server" {
network {
mode = "bridge"
}
service {
name = "www"
port = "8001"
connect {
sidecar_service {}
}
}
task "task" {
driver = "docker"
config {
image = "0x74696d/dnstools"
command = "/bin/bash"
args = ["local/hosts-entrypoint.sh"]
ports = ["www"]
}
template {
data = <<EOF
#!/bin/bash
echo $(hostname -I) $HOSTNAME >> /etc/hosts
exec busybox httpd -v -f -p 8001 -h /srv
EOF
destination = "local/hosts-entrypoint.sh"
}
}
}
} If we run that and check
Which is the hostname for the pause container:
And we get the same hostname in the Envoy proxy:
So I think our goal here is to get the pause container's hostname into its own |
Well, as it turns out a patch like the following, which adds a host to the task containers: diff --git a/drivers/docker/driver.go b/drivers/docker/driver.go
index 7234d3467..7200d08d8 100644
--- a/drivers/docker/driver.go
+++ b/drivers/docker/driver.go
@@ -1040,6 +1040,11 @@ func (d *Driver) createContainerConfig(task *drivers.TaskConfig, driverConfig *T
netMode := fmt.Sprintf("container:%s", task.NetworkIsolation.Labels[dockerNetSpecLabelKey])
logger.Debug("configuring network mode for task group", "network_mode", netMode)
hostConfig.NetworkMode = netMode
+ hostConfig.ExtraHosts = []string{
+ fmt.Sprintf("%s %s",
+ task.NetworkIsolation.Labels[dockerNetSpecLabelKey][:12]),
+ "127.0.0.1", // TODO: placeholder
+ }
} else {
// docker default
logger.Debug("networking mode not specified; using default") Unfortunately that gets rejected by the Docker daemon as well:
One last thing we might be able to try here is to override the default entrypoint of the Envoy proxy container and inject the hostname into |
diff --git a/nomad/job_endpoint_hook_connect.go b/nomad/job_endpoint_hook_connect.go
index 989039cde..2f5873de8 100644
--- a/nomad/job_endpoint_hook_connect.go
+++ b/nomad/job_endpoint_hook_connect.go
@@ -31,14 +31,15 @@ func connectSidecarResources() *structs.Resources {
// connectSidecarDriverConfig is the driver configuration used by the injected
// connect proxy sidecar task.
func connectSidecarDriverConfig() map[string]interface{} {
+
+ args := fmt.Sprintf("echo $(hostname -I) $HOSTNAME >> /etc/hosts && exec /docker-entrypoint.sh -c %s -l ${meta.connect.log_level} --concurrency ${meta.connect.proxy_concurrency} --disable-hot-restart", structs.EnvoyBootstrapPath)
+
return map[string]interface{}{
"image": envoy.SidecarConfigVar,
"args": []interface{}{
- "-c", structs.EnvoyBootstrapPath,
- "-l", "${meta.connect.log_level}",
- "--concurrency", "${meta.connect.proxy_concurrency}",
- "--disable-hot-restart",
+ "-c", args,
},
+ "entrypoint": []string{"/bin/sh"},
}
} Results:
I'll need to consult a bit with my colleagues who have more Connect expertise than me to see if that's an approach we can live with here. (And obviously clean up how hacky the patch is and fix any tests it breaks!) |
@Ilhicas I do have to ask at this point why you want to use For example, if we try the following job, the "client" task won't be able to reach the "server" task: jobspecjob "example" {
datacenters = ["dc1"]
group "server" {
network {
mode = "bridge"
}
service {
name = "www"
port = "8001"
connect {
sidecar_service {}
}
}
task "task" {
driver = "docker"
config {
image = "0x74696d/dnstools"
command = "/bin/bash"
args = ["local/hosts-entrypoint.sh"]
ports = ["www"]
}
template {
data = <<EOF
#!/bin/bash
echo $(hostname -I) $HOSTNAME >> /etc/hosts
exec busybox httpd -v -f -p $(hostname -I):8001 -h /srv
EOF
destination = "local/hosts-entrypoint.sh"
}
}
}
group "client" {
network {
mode = "bridge"
}
service {
name = "client"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "www"
local_bind_port = 8080
}
}
}
}
}
task "task" {
driver = "docker"
config {
image = "0x74696d/dnstools"
command = "/bin/sh"
args = ["-c", "sleep 5; while true; do curl -v \"http://${NOMAD_UPSTREAM_ADDR_www}\" ; sleep 10; done"]
# network_mode = "bridge"
}
}
}
} But if we swap out the |
Nevermind the above... the same situation exists whenever you have |
Proof-of-concept fix in #10766 but I need to make sure it won't interfere with other networking modes, with mixed task driver allocations, etc. |
Hi @tgross thanks for all the effort, sorry for being late to that question :D but it seems you got it all figured out by now. |
#10766 has been merged and will ship in 1.1.2. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.12.4 (8efaee4) cli
Operating system and Environment details
Hosts Linux on EC2 Nomad v.0.12.3 Consul 0.1.8
Clients Linux on EC2 Nomad v.0.12.3 Consul 0.1.8
Issue
When using "bridge" mode alongside with cni plugin, the mode of the container is still host
Hosts are currently not allowed to be set on host mode (should be default according to docs, but when using bridge with connect stanza doesn't honour the default)
This brings with it a series of challenges to applications (namely java ) that use hostname -f or -i or any sort of equivalent by getting local address based on hostname
Reproduction steps
Enter a container running with consul connect in network "bridge" and try to do a reverse lookup using hostname -i
or hostname -f
hostname -I does return the ip attached to the container
Job file (if appropriate)
If I try to use :
Hostname resolution works as expected, however the proxy is not available in localhost
The text was updated successfully, but these errors were encountered: