incorrect advertisement w/ host_network #10001

mr-karan · 2021-02-10T10:44:02Z

Nomad version

Nomad v1.0.3 (08741d9f2003ec26e44c72a2c0e27cdf0eadb6ee)

Operating system and Environment details

Ubuntu 20.04
DigitalOcean Droplet

Issue

I've configured host_network on my client's config like:

  host_network "tailscale" {
    cidr = "100.119.138.27/32"
    reserved_ports = "22"
  }

After deploying a task, when I viewed nomad alloc status <id> I noticed a strange thing:

Allocation Addresses
Label   Dynamic  Address
*http   yes      68.x.y.4:20640 -> 80
*https  yes      68.x.y.4:25547 -> 443
*dns    yes      68.x.y.4:53 -> 53

Here 68.x.y.4 is the public IPv4 of my server. But when I the same port mapping using docker ps I observed a completely different (and correct) output:

100.119.138.27:53->53/tcp, 100.119.138.27:53->53/udp, 67/udp, 100.119.138.27:20640->80/tcp, 100.119.138.27:20640->80/udp, 100.119.138.27:25547->443/tcp, 100.119.138.27:25547->443/udp

Reproduction steps

Changed the service stanza to add http port instead of https.
Noticed the Docker port forwarding was still correct but Consul and Nomad both showed wrong ports.
Tried to change the service stanza again and the problem vanished.

It might be hard to reproduce but definitely the wrong host address was mapped inside Nomad. I tried to change the Service multiple times but couldn't reproduce this.

Job file (if appropriate)

job "pihole" {
  datacenters = ["hydra"]
  type        = "service"
  group "web" {
    count = 1
    network {
      port "dns" {
        static       = 53
        to           = 53
        host_network = "tailscale"
      }
      port "http" {
        to           = 80
        host_network = "tailscale"
      }
      port "https" {
        to           = 443
        host_network = "tailscale"
      }
    }
    service {
      name = "pihole-admin"
      tags = ["pihole", "admin"]
      port = "http" # Terminate SSL at Caddy.
    }
    restart {
      attempts = 2
      interval = "2m"
      delay    = "30s"
      mode     = "fail"
    }
    task "app" {
      driver = "docker"
      config {
        image = "pihole/pihole:v5.6"
        # Bind the data directory to preserve config.
        mount {
          type     = "bind"
          target   = "/etc/dnsmasq.d"
          source   = "/data/pihole/dnsmasq.d/"
          readonly = false
        }
        mount {
          type     = "bind"
          target   = "/etc/pihole"
          source   = "/data/pihole/conf/"
          readonly = false
        }
        ports = ["http", "https", "dns"]
      }
      env {
        TZ = "Asia/Kolkata"
      }
      resources {
        cpu    = 200
        memory = 100
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

tgross · 2021-02-10T13:31:50Z

Hi @mr-karan! A few questions that might help narrow this down:

You pointed out that the Docker network configuration looked right; was traffic reachable on that IP as you'd expect?
~~There's a service block in this jobspec. Was the correct IP being advertised to Consul?~~ Nevermind, I just re-read and saw you said "Consul and Nomad both showed wrong ports."

tgross · 2021-02-10T13:36:06Z

Possible duplicate of #9006

mr-karan · 2021-02-10T13:53:02Z

I am able to reproduce this.

The only change I did:

nomad alloc status <id>

docker ps

was traffic reachable on that IP as you'd expect?

Yep. It's infact a Pihole that I'm running. So I did a few quick curl and dig tests to show the traffic is going to the ports correctly on the IPs advertised by Docker.

Port 53:

dig hashicorp.com @100.119.138.27 +short
76.76.21.21

Port 80:

curl -i http://100.119.138.27:31460                                                                                                    
HTTP/1.1 200 OK
Content-type: text/html; charset=UTF-8
Expires: Wed, 10 Feb 2021 13:43:38 GMT
Cache-Control: max-age=0
Content-Length: 645
Date: Wed, 10 Feb 2021 13:43:38 GMT
Server: lighttpd/1.4.53


    <!doctype html>
    <html lang='en'>
        <head>
            <meta charset='utf-8'>
            
            <title>● </title>
            <link rel='stylesheet' href='pihole/blockingpage.css'>
            <link rel='shortcut icon' href='admin/img/favicons/favicon.ico' type='image/x-icon'>
        </head>
        <body id='splashpage'>
            <img src='admin/img/logo.svg' alt='Pi-hole logo' width='256' height='377'>
            <br>
            <p>Pi-<strong>hole</strong>: Your black hole for Internet advertisements</p>
            <a href='/admin'>Did you mean to go to the admin panel?</a>
        </body>
    </html>
    %

And now here's the strange part.

Port 443 (which is the one I changed in service stanza) does not work for the IP advertised by Docker:

curl -i http://100.119.138.27:23776                                                                                                 
curl: (7) Failed to connect to 100.119.138.27 port 23776: Connection refused

I tried to do curl and dig tests on all the ports advertised by nomad output, they all showed connection refused.

The problem disappears when I do a fresh deployment:

nomad stop pihole
nomad system gc
nomad run pihole.nomad

Let me know please if any further steps you'd like me to perform to narrow down the issue.

tcurdt · 2021-02-21T00:40:27Z

I think I've also run into the same thing.

This is my Nomad config:

bind_addr = "0.0.0.0"
advertise {
  http = "192.168.100.101"
  rpc  = "192.168.100.101"
  serf = "192.168.100.101"
}
client {
  enabled = true
  server_join {
    retry_join = [ "192.168.100.101" ]
  }
  host_network "default" { cidr = "192.168.100.0/24" }
}
...

which seems to be necessarity as Vagrant hogs eth0.

The job specifies the host_network:

...
group "backend" {
  count = 2
  network {
    port  "http" {
      host_network = "default"
    }
  }
  ...

And the docker driver does use the correct/expected ip/port allocation:

192.168.100.101:30544->30544/tcp
192.168.100.101:30544->30544/udp

While on the other hand Nomad pushes the following allocation to Consul

Allocation Addresses
Label  Dynamic  Address
*http  yes      10.0.2.15:30544

The very least the allocation should match what the docker driver is running.

shoenig · 2023-01-12T16:13:30Z

I am able to reproduce this on Nomad v1.0.3, but not able to reproduce on 1.2.x+. I'll spend a bit more time trying to bisect when this was fixed but unless folks are still seeing the behavior on a supported Nomad version I think we can close this out.

Create network

sudo ip link add mybridge type bridge
sudo ip link set dev mybridge up
sudo ip address add dev mybridge 10.0.0.1/24

nomad.hcl

server {
  enabled = true
}

client {
  enabled = true
  
  host_network "mynet" {
    cidr = "10.0.0.1/24"
    reserved_ports = 22
  }
}

Start nomad & consul

consul agent -dev
sudo nomad agent -dev -config=nomad.hcl

example.nomad

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      port "db1" {
        to = 6379
        host_network = "mynet"
      }
      port "db2" {
        to = 9999
        host_network = "mynet"
      }
    }
    
    service {
      name = "redis"
      port = "db2"
    }

    task "redis" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db1", "db2"]
        auth_soft_fail = true
      }

      resources {
        cpu    = 100
        memory = 32
      }
    }
  }
}

first run

➜ nomad alloc status 88

Allocation Addresses
Label  Dynamic  Address
*db1   yes      10.0.0.1:25251 -> 6379
*db2   yes      10.0.0.1:28567 -> 9999

modify service port label

➜ nomad job plan example.nomad 
+/- Job: "example"
+/- Task Group: "cache" (1 in-place update)
  +/- Service {
        AddressMode:       "auto"
        EnableTagOverride: "false"
        Name:              "redis"
    +/- PortLabel:         "db1" => "db2"

check networks again, they have changed (!)

➜ nomad alloc status 88

Allocation Addresses
Label  Dynamic  Address
*db1   yes      127.0.0.1:25251 -> 6379
*db2   yes      127.0.0.1:28567 -> 9999

After running through these steps on 1.2.15, 1.3.8, 1.4.3, and 1.4.4-dev the addresses remain correct after doing the update.

@tcurdt were you seeing that wrong address after an update to the service stanza? Otherwise it might just be a different bug with similar symptom. Either way, if anyone can reproduce on a recent version feel free to re-open / file a new issue.

tcurdt · 2023-01-12T16:17:35Z

@shoenig Unfortunately it's been too long since and I cannot really answer that question anymore.
But thanks for looking into this!

github-actions · 2023-05-13T02:08:13Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross changed the title ~~Possible bug in Host Network Mode~~ incorrect advertisement w/ host_network Feb 10, 2021

tgross added theme/networking type/bug labels Feb 10, 2021

tgross mentioned this issue Feb 10, 2021

host_network (using Nebula) vs. DNS #9006

Open

tgross added the stage/needs-investigation label Feb 10, 2021

mr-karan mentioned this issue Feb 12, 2021

Redeployment with changes in service definition alters allocation IP address (multiple interfaces) #10016

Closed

tgross mentioned this issue Feb 16, 2021

Multi-Interface doesn't work if using network_mode stanza in docker driven task #10010

Closed

schmichael added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Mar 12, 2021

tgross mentioned this issue Mar 7, 2022

address_mode with ingress bound to localhost #12203

Closed

shoenig self-assigned this Jan 11, 2023

shoenig added this to the 1.5.0 milestone Jan 11, 2023

shoenig closed this as completed Jan 12, 2023

github-actions bot locked as resolved and limited conversation to collaborators May 13, 2023

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Done in Nomad - Community Issues Triage Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect advertisement w/ host_network #10001

incorrect advertisement w/ host_network #10001

mr-karan commented Feb 10, 2021 •

edited

Loading

tgross commented Feb 10, 2021 •

edited

Loading

tgross commented Feb 10, 2021

mr-karan commented Feb 10, 2021

tcurdt commented Feb 21, 2021

shoenig commented Jan 12, 2023

tcurdt commented Jan 12, 2023

github-actions bot commented May 13, 2023

incorrect advertisement w/ host_network #10001

incorrect advertisement w/ host_network #10001

Comments

mr-karan commented Feb 10, 2021 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

tgross commented Feb 10, 2021 • edited Loading

tgross commented Feb 10, 2021

mr-karan commented Feb 10, 2021

tcurdt commented Feb 21, 2021

shoenig commented Jan 12, 2023

tcurdt commented Jan 12, 2023

github-actions bot commented May 13, 2023

mr-karan commented Feb 10, 2021 •

edited

Loading

tgross commented Feb 10, 2021 •

edited

Loading