Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect advertisement w/ host_network #10001

Closed
mr-karan opened this issue Feb 10, 2021 · 7 comments
Closed

incorrect advertisement w/ host_network #10001

mr-karan opened this issue Feb 10, 2021 · 7 comments
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Milestone

Comments

@mr-karan
Copy link
Contributor

mr-karan commented Feb 10, 2021

Nomad version

Nomad v1.0.3 (08741d9f2003ec26e44c72a2c0e27cdf0eadb6ee)

Operating system and Environment details

  • Ubuntu 20.04
  • DigitalOcean Droplet

Issue

I've configured host_network on my client's config like:

  host_network "tailscale" {
    cidr = "100.119.138.27/32"
    reserved_ports = "22"
  }

After deploying a task, when I viewed nomad alloc status <id> I noticed a strange thing:

Allocation Addresses
Label   Dynamic  Address
*http   yes      68.x.y.4:20640 -> 80
*https  yes      68.x.y.4:25547 -> 443
*dns    yes      68.x.y.4:53 -> 53

Here 68.x.y.4 is the public IPv4 of my server. But when I the same port mapping using docker ps I observed a completely different (and correct) output:

100.119.138.27:53->53/tcp, 100.119.138.27:53->53/udp, 67/udp, 100.119.138.27:20640->80/tcp, 100.119.138.27:20640->80/udp, 100.119.138.27:25547->443/tcp, 100.119.138.27:25547->443/udp

Reproduction steps

  • Changed the service stanza to add http port instead of https.
  • Noticed the Docker port forwarding was still correct but Consul and Nomad both showed wrong ports.
  • Tried to change the service stanza again and the problem vanished.

It might be hard to reproduce but definitely the wrong host address was mapped inside Nomad. I tried to change the Service multiple times but couldn't reproduce this.

Job file (if appropriate)

job "pihole" {
  datacenters = ["hydra"]
  type        = "service"
  group "web" {
    count = 1
    network {
      port "dns" {
        static       = 53
        to           = 53
        host_network = "tailscale"
      }
      port "http" {
        to           = 80
        host_network = "tailscale"
      }
      port "https" {
        to           = 443
        host_network = "tailscale"
      }
    }
    service {
      name = "pihole-admin"
      tags = ["pihole", "admin"]
      port = "http" # Terminate SSL at Caddy.
    }
    restart {
      attempts = 2
      interval = "2m"
      delay    = "30s"
      mode     = "fail"
    }
    task "app" {
      driver = "docker"
      config {
        image = "pihole/pihole:v5.6"
        # Bind the data directory to preserve config.
        mount {
          type     = "bind"
          target   = "/etc/dnsmasq.d"
          source   = "/data/pihole/dnsmasq.d/"
          readonly = false
        }
        mount {
          type     = "bind"
          target   = "/etc/pihole"
          source   = "/data/pihole/conf/"
          readonly = false
        }
        ports = ["http", "https", "dns"]
      }
      env {
        TZ = "Asia/Kolkata"
      }
      resources {
        cpu    = 200
        memory = 100
      }
    }
  }
}
@tgross
Copy link
Member

tgross commented Feb 10, 2021

Hi @mr-karan! A few questions that might help narrow this down:

  • You pointed out that the Docker network configuration looked right; was traffic reachable on that IP as you'd expect?
  • There's a service block in this jobspec. Was the correct IP being advertised to Consul? Nevermind, I just re-read and saw you said "Consul and Nomad both showed wrong ports."

@tgross tgross changed the title Possible bug in Host Network Mode incorrect advertisement w/ host_network Feb 10, 2021
@tgross
Copy link
Member

tgross commented Feb 10, 2021

Possible duplicate of #9006

@mr-karan
Copy link
Contributor Author

I am able to reproduce this.

The only change I did:

image

nomad alloc status <id>

Screenshot from 2021-02-10 19-10-23

docker ps

Screenshot from 2021-02-10 19-20-28

was traffic reachable on that IP as you'd expect?

Yep. It's infact a Pihole that I'm running. So I did a few quick curl and dig tests to show the traffic is going to the ports correctly on the IPs advertised by Docker.

Port 53:

dig hashicorp.com @100.119.138.27 +short
76.76.21.21

Port 80:

curl -i http://100.119.138.27:31460                                                                                                    
HTTP/1.1 200 OK
Content-type: text/html; charset=UTF-8
Expires: Wed, 10 Feb 2021 13:43:38 GMT
Cache-Control: max-age=0
Content-Length: 645
Date: Wed, 10 Feb 2021 13:43:38 GMT
Server: lighttpd/1.4.53


    <!doctype html>
    <html lang='en'>
        <head>
            <meta charset='utf-8'>
            
            <title>● </title>
            <link rel='stylesheet' href='pihole/blockingpage.css'>
            <link rel='shortcut icon' href='admin/img/favicons/favicon.ico' type='image/x-icon'>
        </head>
        <body id='splashpage'>
            <img src='admin/img/logo.svg' alt='Pi-hole logo' width='256' height='377'>
            <br>
            <p>Pi-<strong>hole</strong>: Your black hole for Internet advertisements</p>
            <a href='/admin'>Did you mean to go to the admin panel?</a>
        </body>
    </html>
    %                          

And now here's the strange part.

Port 443 (which is the one I changed in service stanza) does not work for the IP advertised by Docker:

curl -i http://100.119.138.27:23776                                                                                                 
curl: (7) Failed to connect to 100.119.138.27 port 23776: Connection refused

I tried to do curl and dig tests on all the ports advertised by nomad output, they all showed connection refused.

The problem disappears when I do a fresh deployment:

nomad stop pihole
nomad system gc
nomad run pihole.nomad 

Screenshot from 2021-02-10 19-22-22

Let me know please if any further steps you'd like me to perform to narrow down the issue.

@tcurdt
Copy link

tcurdt commented Feb 21, 2021

I think I've also run into the same thing.

This is my Nomad config:

bind_addr = "0.0.0.0"
advertise {
  http = "192.168.100.101"
  rpc  = "192.168.100.101"
  serf = "192.168.100.101"
}
client {
  enabled = true
  server_join {
    retry_join = [ "192.168.100.101" ]
  }
  host_network "default" { cidr = "192.168.100.0/24" }
}
...

which seems to be necessarity as Vagrant hogs eth0.

The job specifies the host_network:

...
group "backend" {
  count = 2
  network {
    port  "http" {
      host_network = "default"
    }
  }
  ...

And the docker driver does use the correct/expected ip/port allocation:

192.168.100.101:30544->30544/tcp
192.168.100.101:30544->30544/udp

While on the other hand Nomad pushes the following allocation to Consul

Allocation Addresses
Label  Dynamic  Address
*http  yes      10.0.2.15:30544

The very least the allocation should match what the docker driver is running.

@schmichael schmichael added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Mar 12, 2021
@shoenig shoenig self-assigned this Jan 11, 2023
@shoenig shoenig added this to the 1.5.0 milestone Jan 11, 2023
@shoenig
Copy link
Member

shoenig commented Jan 12, 2023

I am able to reproduce this on Nomad v1.0.3, but not able to reproduce on 1.2.x+. I'll spend a bit more time trying to bisect when this was fixed but unless folks are still seeing the behavior on a supported Nomad version I think we can close this out.

Create network

sudo ip link add mybridge type bridge
sudo ip link set dev mybridge up
sudo ip address add dev mybridge 10.0.0.1/24

nomad.hcl

server {
  enabled = true
}

client {
  enabled = true
  
  host_network "mynet" {
    cidr = "10.0.0.1/24"
    reserved_ports = 22
  }
}

Start nomad & consul

consul agent -dev
sudo nomad agent -dev -config=nomad.hcl

example.nomad

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      port "db1" {
        to = 6379
        host_network = "mynet"
      }
      port "db2" {
        to = 9999
        host_network = "mynet"
      }
    }
    
    service {
      name = "redis"
      port = "db2"
    }

    task "redis" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db1", "db2"]
        auth_soft_fail = true
      }

      resources {
        cpu    = 100
        memory = 32
      }
    }
  }
}

first run

➜ nomad alloc status 88

Allocation Addresses
Label  Dynamic  Address
*db1   yes      10.0.0.1:25251 -> 6379
*db2   yes      10.0.0.1:28567 -> 9999

modify service port label

➜ nomad job plan example.nomad 
+/- Job: "example"
+/- Task Group: "cache" (1 in-place update)
  +/- Service {
        AddressMode:       "auto"
        EnableTagOverride: "false"
        Name:              "redis"
    +/- PortLabel:         "db1" => "db2"

check networks again, they have changed (!)

➜ nomad alloc status 88

Allocation Addresses
Label  Dynamic  Address
*db1   yes      127.0.0.1:25251 -> 6379
*db2   yes      127.0.0.1:28567 -> 9999

After running through these steps on 1.2.15, 1.3.8, 1.4.3, and 1.4.4-dev the addresses remain correct after doing the update.

@tcurdt were you seeing that wrong address after an update to the service stanza? Otherwise it might just be a different bug with similar symptom. Either way, if anyone can reproduce on a recent version feel free to re-open / file a new issue.

@shoenig shoenig closed this as completed Jan 12, 2023
@tcurdt
Copy link

tcurdt commented Jan 12, 2023

@shoenig Unfortunately it's been too long since and I cannot really answer that question anymore.
But thanks for looking into this!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Projects
Development

No branches or pull requests

5 participants