Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostname not populated in /etc/hosts for Docker tasks with Connect #8900

Closed
Ilhicas opened this issue Sep 16, 2020 · 16 comments · Fixed by #10766
Closed

hostname not populated in /etc/hosts for Docker tasks with Connect #8900

Ilhicas opened this issue Sep 16, 2020 · 16 comments · Fixed by #10766
Assignees
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Milestone

Comments

@Ilhicas
Copy link
Contributor

Ilhicas commented Sep 16, 2020

Nomad version

Nomad v0.12.4 (8efaee4) cli

Operating system and Environment details

Hosts Linux on EC2 Nomad v.0.12.3 Consul 0.1.8
Clients Linux on EC2 Nomad v.0.12.3 Consul 0.1.8

Issue

When using "bridge" mode alongside with cni plugin, the mode of the container is still host

Hosts are currently not allowed to be set on host mode (should be default according to docs, but when using bridge with connect stanza doesn't honour the default)

This brings with it a series of challenges to applications (namely java ) that use hostname -f or -i or any sort of equivalent by getting local address based on hostname

Reproduction steps

Enter a container running with consul connect in network "bridge" and try to do a reverse lookup using hostname -i
or hostname -f

hostname -I does return the ip attached to the container

Job file (if appropriate)

job "fail-at-host-resolution" {
  group "minio-group-wfm-job-find-fix" {
     network {
       mode = "bridge"
       port "http" {
         to = 9000
       }
     }
     service {
       name = "just-a-job-sample"
       port = 8080
       connect {
         sidecar_service {}
       }
     }
     task "ubuntu-resolution" {
       driver = "docker"
       config {
         image = "ubuntu"
         entrypoint = ["/bin/sh"]
         args = ["-ec", "sleep 1000"]
       }
     }
   }
}

If I try to use :

driver = "docker"
config {
  network_mode=bridge
  ...
}

Hostname resolution works as expected, however the proxy is not available in localhost

@Ilhicas Ilhicas changed the title Host resolution using Consul Connect and CI - hostname -f Host resolution using Consul Connect and CI - hostname -f Not working as docker bridge mode Sep 16, 2020
@Ilhicas Ilhicas changed the title Host resolution using Consul Connect and CI - hostname -f Not working as docker bridge mode Host resolution using Consul Connect and Bridge Network CNI - hostname -f Not working as docker bridge mode Sep 16, 2020
@Ilhicas
Copy link
Contributor Author

Ilhicas commented Sep 22, 2020

For those having this issue, current hackfix solution is:

      config {
        args = ["/tmp/hosts-entrypoint.sh"]
        entrypoint = ["/bin/bash"]
        volumes = [
          "local/hosts-entrypoint.sh:/tmp/hosts-entrypoint.sh",
        ]

        image = "some-debian-else-based-image"
      }

      template {
        data = <<EOF
         #!/bin/bash
         echo $(hostname -I) $HOSTNAME >> /etc/hosts && /entrypoint.sh
         EOF

        destination = "local/hosts-entrypoint.sh"
      }

@Ilhicas
Copy link
Contributor Author

Ilhicas commented Feb 11, 2021

Linking also to: #7746

@Ilhicas
Copy link
Contributor Author

Ilhicas commented Feb 12, 2021

Meanwhile
I'm adding a new workaround

Note its just a sample, and is levant templated

job "[[.JOB]]" {
  datacenters = ["us-east-1a", "us-east-1b", "us-east-1c"]
  type = "service"

  reschedule {
    delay          = "15s"
    delay_function = "exponential"
    max_delay      = "15m"
    attempts       = 10
    interval       = "2h"
    unlimited      = false
  }

  group "[[.JOB]]-database-group" {
    network {
      mode = "bridge"
    }

    service {
      name = "[[.JOB]]-database"
      port = "5432"

      connect {
        sidecar_service {}
      }
    }

    task "[[.JOB]]-database" {
      driver = "docker"

      env {
        ALLOW_IP_RANGE = "0.0.0.0/0"
      }

      config {
        image      = "postgres:11.5"
        advertise_ipv6_address = true
      }
    }

  }
  group "[[.JOB]]-service-test" {
    network {
      mode = "bridge"
    }

    service {
      name = "[[.JOB]]-service-test"
      port = "5432"


      connect {
        sidecar_service {
          tags = ["traefik.enable=false"]

          proxy {
            upstreams {
              destination_name = "[[.JOB]]-database"
              local_bind_port  = 5433
            }
          }
        }
      }
    }

    task "[[.JOB]]-service-test" {
      driver = "docker"

       config {
         image = "ubuntu"
         entrypoint = ["/bin/sh"]
         args = ["-ec", "sleep 1000"]
          volumes = [
          "../alloc/tmp/hosts:/etc/hosts",
        ]
       }
    }
    task "[[.JOB]]-service-test-config" {
      lifecycle {
        sidecar = false
        hook = "prestart"
      }
      driver = "docker"
      config {
        image = "ubuntu"
        entrypoint = ["/bin/bash"]
        args = [
         "/tmp/hosts-entrypoint.sh"
        ]
        volumes = [
          "local/hosts-entrypoint.sh:/tmp/hosts-entrypoint.sh",
        ]
     }
     template {
        data = <<EOF
         #!/bin/bash
         cat /etc/hosts > /alloc/tmp/hosts
         echo $(hostname -I) $HOSTNAME >> /alloc/tmp/hosts
         EOF

        destination = "local/hosts-entrypoint.sh"
      }
    }
  }
}

@tgross tgross added stage/needs-verification Issue needs verifying it still exists and removed stage/needs-investigation labels Mar 24, 2021
@schmichael
Copy link
Member

schmichael commented Apr 15, 2021

Confirmed that Nomad's group networking in bridge mode diverges from Docker's bridge mode networking. Not sure exactly what is going on here, so we'll have to investigate! Hopefully it's a bug and not somewhere we have to diverge from Docker for some reason. If that's the case we'll figure out where to document the difference and offer a workaround.

Output for Nomad bridge networking using these job files:

# nomad-bridge.nomad

root@0a069e4868a7:/data# hostname
0a069e4868a7
root@0a069e4868a7:/data# hostname -f
hostname: Name or service not known
root@0a069e4868a7:/data# hostname -i
hostname: Name or service not known
root@0a069e4868a7:/data# hostname -I
172.26.64.76
root@0a069e4868a7:/data# uname -r
5.4.0-71-generic

Output with Docker's network_mode=bridge:

# docker-bridge.nomad

root@e5ac4fdcc8a1:/data# hostname
e5ac4fdcc8a1
root@e5ac4fdcc8a1:/data# hostname -f
e5ac4fdcc8a1
root@e5ac4fdcc8a1:/data# hostname -i
172.17.0.2
root@e5ac4fdcc8a1:/data# hostname -I
172.17.0.2
root@e5ac4fdcc8a1:/data# uname -r
5.4.0-71-generic

(Behavior is the same on Nomad 0.12.8 and Nomad 1.0.4)

@schmichael schmichael added stage/accepted Confirmed, and intend to work on. No timeline committment though. type/bug and removed stage/needs-verification Issue needs verifying it still exists labels Apr 15, 2021
@tgross tgross added the hcc/cst Admin - internal label Apr 16, 2021
@tgross
Copy link
Member

tgross commented Apr 16, 2021

Cross-linking #8343 which may have a related underlying cause.

@tgross
Copy link
Member

tgross commented Jun 4, 2021

I spent a little time on this and have a root cause. When group.network.mode = "bridge", we need a network namespace and it may be shared across multiple tasks (including non-Docker tasks), so we set up a "pause" container to own the network namespace. The pause container has no network mode.

If thetask.config.network_mode is unset the application container's network mode is set to container:<id of the pause container>:

# docker inspect 9b4 | jq '.[0].HostConfig.NetworkMode'
"container:8c6b40e9132f55af6ab1ea32d372bd86ba7161b96898008ec7e565bd9c12582e"

# docker inspect 8c6 | jq '.[0].HostConfig.NetworkMode'
"none"

It looks like when Docker sets up the /etc/hosts file for our application container, it's setting up /etc/hosts the way it would for the pause container with --network=none:

# docker run -it --rm --network=none debian:buster cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

So when we have the a network mode of container:<id of pause container>, Docker is setting up /etc/hosts to be that of the pause container. And when we nomad alloc exec into the application container, we see the resulting /etc/hosts file is missing the final line that includes the container hostname, and that's why things like hostname -i fail:

# cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

# hostname -i
hostname: Name or service not known

# hostname -f
hostname: Name or service not known

# hostname -I
172.26.64.85

Whereas with group.network.mode = "bridge" and task.config.network_mode = "bridge", the application container's network mode is set to "bridge":

# docker inspect 5c0 | jq '.[0].HostConfig.NetworkMode'
"bridge"

# docker inspect 885 | jq '.[0].HostConfig.NetworkMode'
"none"

And the /etc/hosts file is properly populated and everything works as expected:

# cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3      ca2524bf6b59

# hostname -i
172.17.0.3

# hostname -f
ca2524bf6b59

I'm at the end of my week here so I don't have a solid solution yet. But one option that jumps out at me immediately is that we could try to detect this scenario and inject an extra_hosts field into the Docker container configurations. But there's probably a half dozen things I haven't considered with that yet (like what happens with multiple tasks? how does this work at all with exec tasks?), and that's the sort of thing I'll dig into starting on Monday.

@tgross
Copy link
Member

tgross commented Jun 7, 2021

A couple of approaches that I've discovered won't work:

  • Adding the pause container's hostname to its own /etc/hosts (which gets used by the client container), because the client container still has its own hostname that just isn't showing in the /etc/hosts, and that's what it looks up. Edit, not quite right, see my next comment below
  • Having the job submitter add an artificial hostname field value like task.config.hostname = "${substr(uuidv4(), 0, 8)}", because the Docker API rejects setting the hostname with the network mode we're setting.

But I think we're on the right track here in terms of overriding the /etc/hosts.


As an aside I've confirmed the behavior reported where Connect won't work when the task.config.network_mode = "bridge":

jobspec
job "example" {
  datacenters = ["dc1"]

  group "server" {

    network {
      mode = "bridge"
    }

    service {
      name = "www"
      port = "8001"
      connect {
        sidecar_service {}
      }
    }

    task "task" {
      driver = "docker"

      config {
        image        = "0x74696d/dnstools"
        command      = "busybox"
        args         = ["httpd", "-v", "-f", "-p", "8001", "-h", "/srv"]
        network_mode = "bridge"
      }
    }
  }


  group "client" {

    network {
      mode = "bridge"
    }

    service {
      name = "client"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "www"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "task" {
      driver = "docker"

      config {
        image        = "0x74696d/dnstools"
        command      = "/bin/sh"
        args         = ["-c", "sleep 5; while true; do curl -v \"http://${NOMAD_UPSTREAM_ADDR_www}\" ; sleep 10; done"]
        network_mode = "bridge"
      }
    }
  }


}

That's because the envoy proxy isn't in the same network spacespace as the application. That's expected, but I don't think it's very well documented and it would be nice if we could provide some warnings for the user if they're configuring jobs this way.

$ sudo nsenter -t $(pgrep busybox) --net
root@linux# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp6       0      0 :::8001                 :::*                    LISTEN      28377/busybox

$ sudo nsenter -t $(pgrep pause | head -1) --net
root@linux# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:29478           0.0.0.0:*               LISTEN      28174/envoy
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      28174/envoy
tcp        0      0 127.0.0.1:19001         0.0.0.0:*               LISTEN      28174/envoy
tcp        0      0 172.26.64.138:29478     10.0.2.15:32774         TIME_WAIT   -

@tgross
Copy link
Member

tgross commented Jun 8, 2021

Working a bit more on this I wanted to make sure we're trying to get the right behavior here. I added the hack-around that @Ilhicas suggested (note however that I call exec in my bash script so that the application is PID1 in the container, see Exec from Your start.sh for why):

jobspec
job "example" {
  datacenters = ["dc1"]

  group "server" {

    network {
      mode = "bridge"
    }

    service {
      name = "www"
      port = "8001"
      connect {
        sidecar_service {}
      }
    }

    task "task" {
      driver = "docker"

      config {
        image   = "0x74696d/dnstools"
        command = "/bin/bash"
        args    = ["local/hosts-entrypoint.sh"]
        ports   = ["www"]
      }

      template {
        data = <<EOF
#!/bin/bash
echo $(hostname -I) $HOSTNAME >> /etc/hosts
exec busybox httpd -v -f -p 8001 -h /srv
         EOF

        destination = "local/hosts-entrypoint.sh"
      }

    }
  }
}

If we run that and check /etc/hosts in the application container, we get:

$ nomad alloc exec -task task dff cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.26.64.139 d3bf0a3ad41d

$ nomad alloc exec -task task dff hostname -f
d3bf0a3ad41d

Which is the hostname for the pause container:

$ docker ps -f 'ancestor=gcr.io/google_containers/pause-amd64:3.1'
CONTAINER ID   IMAGE                                      COMMAND    CREATED         STATUS         PORTS     NAMES
d3bf0a3ad41d   gcr.io/google_containers/pause-amd64:3.1   "/pause"   9 minutes ago   Up 9 minutes             nomad_init_dff7e186-d8cf-5225-b7da-2246dcd63de6

And we get the same hostname in the Envoy proxy:

$ docker ps -f 'ancestor=envoyproxy/envoy:v1.16.0'
CONTAINER ID   IMAGE                      COMMAND                  CREATED         STATUS         PORTS     NAMES
79dfbe3dfaa4   envoyproxy/envoy:v1.16.0   "/docker-entrypoint.…"   9 minutes ago   Up 9 minutes             connect-proxy-www-dff7e186-d8cf-5225-b7da-2246dcd63de6

$ docker exec -it 79d cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.26.64.139 d3bf0a3ad41d

So I think our goal here is to get the pause container's hostname into its own /etc/hosts and we'll be good-to-go. I think my previous attempt at that was invalid... the /etc/hostname file in the application container is definitely the pause container's.

@tgross tgross changed the title Host resolution using Consul Connect and Bridge Network CNI - hostname -f Not working as docker bridge mode hostname not populated in /etc/hosts for Docker tasks with Connect Jun 8, 2021
@tgross
Copy link
Member

tgross commented Jun 8, 2021

Well, as it turns out a patch like the following, which adds a host to the task containers:

diff --git a/drivers/docker/driver.go b/drivers/docker/driver.go
index 7234d3467..7200d08d8 100644
--- a/drivers/docker/driver.go
+++ b/drivers/docker/driver.go
@@ -1040,6 +1040,11 @@ func (d *Driver) createContainerConfig(task *drivers.TaskConfig, driverConfig *T
                        netMode := fmt.Sprintf("container:%s", task.NetworkIsolation.Labels[dockerNetSpecLabelKey])
                        logger.Debug("configuring network mode for task group", "network_mode", netMode)
                        hostConfig.NetworkMode = netMode
+                       hostConfig.ExtraHosts = []string{
+                               fmt.Sprintf("%s %s",
+                                       task.NetworkIsolation.Labels[dockerNetSpecLabelKey][:12]),
+                               "127.0.0.1", // TODO: placeholder
+                       }
                } else {
                        // docker default
                        logger.Debug("networking mode not specified; using default")

Unfortunately that gets rejected by the Docker daemon as well:

Recent Events:
Time                  Type            Description
2021-06-08T15:28:34Z  Killing         Sent interrupt. Waiting 5s before force killing
2021-06-08T15:28:34Z  Not Restarting  Error was unrecoverable
2021-06-08T15:28:34Z  Driver Failure  failed to create container: API error (400): conflicting options: custom host-to-IP mapping and the network mode
2021-06-08T15:28:33Z  Task Setup      Building Task Directory
2021-06-08T15:28:32Z  Received        Task received by client

One last thing we might be able to try here is to override the default entrypoint of the Envoy proxy container and inject the hostname into /etc/hosts there.

@tgross
Copy link
Member

tgross commented Jun 8, 2021

Ok, so the following incredibly gross patch seems to get the job done: Edit: but only for Connect jobs! So that doesn't really solve the problem!

diff --git a/nomad/job_endpoint_hook_connect.go b/nomad/job_endpoint_hook_connect.go
index 989039cde..2f5873de8 100644
--- a/nomad/job_endpoint_hook_connect.go
+++ b/nomad/job_endpoint_hook_connect.go
@@ -31,14 +31,15 @@ func connectSidecarResources() *structs.Resources {
 // connectSidecarDriverConfig is the driver configuration used by the injected
 // connect proxy sidecar task.
 func connectSidecarDriverConfig() map[string]interface{} {
+
+       args := fmt.Sprintf("echo $(hostname -I) $HOSTNAME >> /etc/hosts && exec /docker-entrypoint.sh -c %s -l ${meta.connect.log_level} --concurrency ${meta.connect.proxy_concurrency} --disable-hot-restart", structs.EnvoyBootstrapPath)
+
        return map[string]interface{}{
                "image": envoy.SidecarConfigVar,
                "args": []interface{}{
-                       "-c", structs.EnvoyBootstrapPath,
-                       "-l", "${meta.connect.log_level}",
-                       "--concurrency", "${meta.connect.proxy_concurrency}",
-                       "--disable-hot-restart",
+                       "-c", args,
                },
+               "entrypoint": []string{"/bin/sh"},
        }
 }

Results:

$ docker ps
CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS          PORTS     NAMES
ad7ddb367a7b   0x74696d/dnstools                          "busybox httpd -v -f…"   28 seconds ago   Up 28 seconds             task-d90ff804-5586-c3f3-6618-0dc63787fe15
ebc6ee411ce3   envoyproxy/envoy:v1.16.0                   "/bin/sh -c 'echo $(…"   29 seconds ago   Up 29 seconds             connect-proxy-www-d90ff804-5586-c3f3-6618-0dc63787fe15
b344a90c1256   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 30 seconds ago   Up 29 seconds             nomad_init_d90ff804-5586-c3f3-6618-0dc63787fe15

$ docker exec -it ad7 hostname -f
b344a90c1256

$ docker exec -it ad7 cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.26.64.151 b344a90c1256

I'll need to consult a bit with my colleagues who have more Connect expertise than me to see if that's an approach we can live with here. (And obviously clean up how hacky the patch is and fix any tests it breaks!)

@tgross
Copy link
Member

tgross commented Jun 9, 2021

@Ilhicas I do have to ask at this point why you want to use hostname -I in a Connect-enabled container in the first place. The IP address that you'll get from that isn't going to be useful to bind to -- no other tasks will be able to reach it on that IP address.

For example, if we try the following job, the "client" task won't be able to reach the "server" task:

jobspec
job "example" {
  datacenters = ["dc1"]

  group "server" {

    network {
      mode = "bridge"
    }

    service {
      name = "www"
      port = "8001"
      connect {
        sidecar_service {}
      }
    }

    task "task" {
      driver = "docker"

      config {
        image   = "0x74696d/dnstools"
        command = "/bin/bash"
        args    = ["local/hosts-entrypoint.sh"]
        ports   = ["www"]
      }



      template {
        data = <<EOF
#!/bin/bash
echo $(hostname -I) $HOSTNAME >> /etc/hosts
exec busybox httpd -v -f -p $(hostname -I):8001 -h /srv
         EOF

        destination = "local/hosts-entrypoint.sh"
      }

    }
  }


  group "client" {

    network {
      mode = "bridge"
    }

    service {
      name = "client"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "www"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "task" {
      driver = "docker"

      config {
        image   = "0x74696d/dnstools"
        command = "/bin/sh"
        args    = ["-c", "sleep 5; while true; do curl -v \"http://${NOMAD_UPSTREAM_ADDR_www}\" ; sleep 10; done"]
        # network_mode = "bridge"
      }

    }
  }


}

But if we swap out the $(hostname -I) to 127.0.0.1 so that the server is bound only on local host, the client task can reach it over the Connect service mesh, but by design the server can't be reached from outside the mesh (without a gateway).

@tgross
Copy link
Member

tgross commented Jun 9, 2021

Nevermind the above... the same situation exists whenever you have network.mode.bridge, so it's not specific to Connect tasks. And I just realized that with exec tasks, we're copying in /etc/hosts from the host, so you can have an allocation where two tasks are in the same network namespace but different /etc/hosts (although they have the same IP as we'd expect).

@tgross
Copy link
Member

tgross commented Jun 15, 2021

Proof-of-concept fix in #10766 but I need to make sure it won't interfere with other networking modes, with mixed task driver allocations, etc.

@Ilhicas
Copy link
Contributor Author

Ilhicas commented Jun 16, 2021

Hi @tgross thanks for all the effort, sorry for being late to that question :D but it seems you got it all figured out by now.

@tgross
Copy link
Member

tgross commented Jun 16, 2021

#10766 has been merged and will ship in 1.1.2.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants