Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer's port 80 and 443 blocked after updating client from 0.29.4 to 0.30.0 #2701

Closed
IKA3RUS opened this issue Oct 7, 2024 · 21 comments
Closed

Comments

@IKA3RUS
Copy link

IKA3RUS commented Oct 7, 2024

Describe the problem

I run a self hosted netbird setup. On one of the peers, I run a traefik instance with a few webservices. After updating this peer's netbird client from 0.29.4 to 0.30.0, I'm unable to connect to its port 80 and 443 from other peers.

Everything else, including ping and ssh to it works. Port 80 and 443 also start accepting connections immediately if I downgrade the netbird client back to 0.29.4 – and stops working immediately if I update to 0.30.0.

To Reproduce

  1. Run sudo apt-get install netbird=0.30.0 to update the netbird client.
  2. Verify the ports are being listened on the defective netbird peer
user@defective-peer:~$ ss -tuln | grep -E '(:80|:443)'
tcp   LISTEN 0      4096         0.0.0.0:80         0.0.0.0:*
tcp   LISTEN 0      4096         0.0.0.0:443        0.0.0.0:*
tcp   LISTEN 0      4096   100.64.58.207:44338      0.0.0.0:*
tcp   LISTEN 0      4096            [::]:80            [::]:*
tcp   LISTEN 0      4096            [::]:443           [::]:*
user@defective-peer:~$ nmap -p- 0.0.0.0
Starting Nmap 7.93 ( https://nmap.org ) at 2024-10-06 20:05 IST
Nmap scan report for 0.0.0.0
Host is up (0.000059s latency).
Not shown: 65529 closed tcp ports (conn-refused)
PORT      STATE SERVICE
22/tcp    open  ssh
80/tcp    open  http
443/tcp   open  https
631/tcp   open  ipp
41981/tcp open  unknown
52874/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.85 seconds
  1. Try to reach the defective peer from a different peer. Ports 80 and 443 seem to be filtered. There are no firewalls on the defective peer or anywhere on the network which could be blocking these other than netbird itself.
user@different-peer:~$ nmap -Pn -p- --reason defective-peer.netbird.selfhosted
Starting Nmap 7.95 ( https://nmap.org ) at 2024-10-06 20:17 IST
Nmap scan report for defective-peer.netbird.selfhosted (100.64.58.207)
Host is up, received user-set (0.0035s latency).
Not shown: 65531 closed tcp ports (conn-refused)
PORT      STATE    SERVICE REASON
22/tcp    open     ssh     syn-ack
80/tcp    filtered http    no-response
443/tcp   filtered https   no-response
44338/tcp open     unknown syn-ack

Nmap done: 1 IP address (1 host up) scanned in 49.06 seconds
  1. Downgrade netbird client on the defective peer with sudo apt-get install netbird=0.29.0 and verify outputs from step 2 haven't changed.
  2. Try to nmap the defective peer from a different peer again. It starts responding to ports 80 and 443.
user@different-peer:~$ nmap -Pn -p- --reason defective-peer.netbird.selfhosted
Starting Nmap 7.95 ( https://nmap.org ) at 2024-10-06 20:34 IST
Nmap scan report for yukino-100.netbird.selfhosted (100.64.58.207)
Host is up, received user-set (0.29s latency).
Not shown: 65531 closed tcp ports (conn-refused)
PORT      STATE SERVICE REASON
22/tcp    open  ssh     syn-ack
80/tcp    open  http    syn-ack
443/tcp   open  https   syn-ack
44338/tcp open  unknown syn-ack

Nmap done: 1 IP address (1 host up) scanned in 4096.96 seconds

Expected behavior

There shouldn't be any difference in reaching port 80 and 443, after updating to 0.30.0.

Are you using NetBird Cloud?

No. I'm self-hosting the Netbird control plane on a Hetzner VPS.

NetBird version

0.30.0

NetBird status -dA output:

user@defective-peer:~$ netbird status -dA

 different-peer.netbird.selfhosted:
  NetBird IP: 100.64.196.33
  Public key: QRedactedBOw6/xKAuuht55kDxmK56LFXYSldKqb1CI=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): srflx/host
  ICE candidate endpoints (Local/Remote): 198.51.100.0:40320/192.168.0.183:51820
  Relay server address: rels://netbird.anon-ERw0W.domain:443/relay
  Last connection update: 34 minutes, 33 seconds ago
  Last WireGuard handshake: 27 seconds ago
  Transfer status (received/sent) 19.8 MiB/16.4 MiB
  Quantum resistance: false
  Routes: -
  Latency: 3.032567ms

OS: linux/amd64
Daemon version: 0.30.0
CLI version: 0.30.0
Management: Connected to https://netbird.anon-ERw0W.domain:443
Signal: Connected to https://netbird.anon-ERw0W.domain:443
Relays:
  [stun:netbird.anon-ERw0W.domain:3478] is Unavailable, reason: stun request: context deadline exceeded
  [turn:netbird.anon-ERw0W.domain:3478?transport=udp] is Available
  [rels://netbird.anon-ERw0W.domain:443/relay] is Available
Nameservers:
FQDN: defective-peer.netbird.selfhosted
NetBird IP: 100.64.58.207/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 4/4 Connected // Removed all but the peer I am trying to connect from

Additional context

I've tried running this with the control plane and the other non-defective peer running at both 0.29.4 and 0.30.0. It doesn't seem to make a difference.

@fejiso
Copy link

fejiso commented Oct 7, 2024

Exactly the same issue here. OP, are those services that fail behind Docker?

In my case: Netbird client for Unraid auto-updated to 0.30; instantaneously all of my Docker services became unreachable from any other Netbird host (from Unraid they're reachable on the Netbird IP). Unraid services other than Docker seem reachable (ssh, http).

Downgrading to netbirdio/netbird:0.29.4 fixes the issue.

@hadleyrich
Copy link

hadleyrich commented Oct 7, 2024

I have seen this upon upgrading to 0.30.0 also.

As above, for me the filtered ports are docker port forwards, and all ports >22 are in filtered state.

I'm guessing it's going to be related to #2100

@hadleyrich
Copy link

Here's the netbird nft table output from two of my hosts, the first on 0.29.4 and the second on 0.30.0. They are somewhat different. I'm still poking around but thought it might be useful information and prompt something for those who know the internals more:

table ip netbird {
	set nb0000001 {
		type ipv4_addr
		flags dynamic
		elements = { 0.0.0.0 }
	}

	set nb0000002 {
		type ipv4_addr
		flags dynamic
		elements = { 0.0.0.0 }
	}

	chain netbird-rt-fwd {
	}

	chain netbird-rt-nat {
		type nat hook postrouting priority srcnat - 1; policy accept;
		oifname "lo" return
	}

	chain netbird-acl-input-rules {
		iifname "wt0" accept
	}

	chain netbird-acl-output-rules {
		oifname "wt0" accept
	}

	chain netbird-acl-input-filter {
		type filter hook input priority filter; policy accept;
		iifname "wt0" ip saddr 100.114.0.0/16 ip daddr != 100.114.0.0/16 accept
		iifname "wt0" ip saddr != 100.114.0.0/16 ip daddr 100.114.0.0/16 accept
		iifname "wt0" ip saddr 100.114.0.0/16 ip daddr 100.114.0.0/16 jump netbird-acl-input-rules
		iifname "wt0" drop
	}

	chain netbird-acl-output-filter {
		type filter hook output priority filter; policy accept;
		oifname "wt0" ip saddr != 100.114.0.0/16 ip daddr 100.114.0.0/16 accept
		oifname "wt0" ip saddr 100.114.0.0/16 ip daddr != 100.114.0.0/16 accept
		oifname "wt0" ip saddr 100.114.0.0/16 ip daddr 100.114.0.0/16 jump netbird-acl-output-rules
		oifname "wt0" drop
	}

	chain netbird-acl-forward-filter {
		type filter hook forward priority filter; policy accept;
		iifname "wt0" jump netbird-rt-fwd
		oifname "wt0" jump netbird-rt-fwd
		iifname "wt0" meta mark 0x000007e4 accept
		oifname "wt0" meta mark 0x000007e4 accept
		iifname "wt0" jump netbird-acl-input-rules
		iifname "wt0" drop
	}

	chain netbird-acl-prerouting-filter {
		type filter hook prerouting priority mangle; policy accept;
		iifname "wt0" ip saddr != 100.114.0.0/16 ip daddr 100.114.221.136 meta mark set 0x000007e4
	}
}
table ip netbird {
	set nb0000001 {
		type ipv4_addr
		flags dynamic
		elements = { 0.0.0.0 }
	}

	set nb0000002 {
		type ipv4_addr
		flags dynamic
		elements = { 0.0.0.0 }
	}

	chain netbird-rt-fwd {
		ct state established,related accept
	}

	chain netbird-rt-nat {
		type nat hook postrouting priority srcnat - 1; policy accept;
	}

	chain netbird-acl-input-rules {
		ct state established,related accept
		accept
	}

	chain netbird-acl-output-rules {
		ct state established,related accept
		accept
	}

	chain netbird-acl-input-filter {
		type filter hook input priority filter; policy accept;
		iifname "wt0" jump netbird-acl-input-rules
		iifname "wt0" drop
	}

	chain netbird-acl-output-filter {
		type filter hook output priority filter; policy accept;
		oifname "wt0" ip daddr != 100.114.0.0/16 accept
		oifname "wt0" jump netbird-acl-output-rules
		oifname "wt0" drop
	}

	chain netbird-acl-forward-filter {
		type filter hook forward priority filter; policy accept;
		iifname "wt0" jump netbird-rt-fwd
		iifname "wt0" drop
	}
}

@IKA3RUS
Copy link
Author

IKA3RUS commented Oct 7, 2024

Exactly the same issue here. OP, are those services that fail behind Docker?

Yes, traefik is running in a docker container, listening on port 80 and 443.

The rest of my services are also behind docker, and are reverse proxied to by traefik. They aren't accessible either since the connections never reach traefik in the first place.

@Dr-Shadow
Copy link

Dr-Shadow commented Oct 7, 2024

Hello @IKA3RUS and everyone else did you refer to Doc ACL pre/post 0.30 ?
It seems we have some settings to do in the ACL before going 0.30.

@Kidswiss
Copy link

Kidswiss commented Oct 7, 2024

I'd also like to add here, that the nft rules break iptables compatibility, which in turn breaks podman:

user@host:~$ sudo systemctl start netbird.service 
user@host:~$ sudo iptables -S
iptables v1.8.10 (nf_tables): table `filter' is incompatible, use 'nft' tool.
user@host:~$ sudo systemctl stop netbird.service 
user@host:~$ sudo iptables -S
-P INPUT DROP
-P FORWARD DROP

@Spiritreader
Copy link

Spiritreader commented Oct 7, 2024

Hello @IKA3RUS and everyone else did you refer to Doc ACL pre/post 0.30 ?

@Dr-Shadow Yes, this is definitely because of that.
I modified the nft rules and it started working again.


I don't really understand how this ACL change makes sense, as bridged docker networks on most deployed systems have their IPs assigned from a pool.
So there's no way to achieve the behavior they describe in the doc except for adding every private IP as routable for every single client that has bridge networks.

Out of the box, docker bridge network forwarding doesn't require any extra configuration at all because it is designed that way: Reachable by default, with forwarding and masquearding configured by docker itself.
Trying to patronize docker's base configuration decision by intervening with their iptables routing configuration from a perspective of a VPN solution seems odd, especially because you can't easily revert this with a one-fits-all solution. There is also no explicit documentation on netbirds side concerning how to handle specific cases like docker except for "Just route it™".

Also this probably doesn't work in general for netbird clients that run dockerized in host network mode.
(Actually it does not, because their mentioned solution works on my machine that has netbird directly installed, adding the same routes fails on all other machines that run in docker host network mode).

What I'm trying to say is that this completely and utterly breaks a: "install netbird, connect two peers and you have a working VPN connection" scenario.
Instead, it introduces a rabbit hole that requires every person that is not super well versed in configuring routes & forwards to spend enormous amounts of time to then end up not being able to fix this after all (because there are cases where it not fixable with the netbird configuration options alone) for something that is supposed to work out of the box.
This happens regardless of whether you're using their managed/paid service or the selfhosted version, as the issue is client-side.

I would much prefer an option to allow ct state NEW connections for each peer instead of this new behavior, which I consider a limitation instead of a feature.

It seems we have some settings to do in the ACL before going 0.30.

With the current state fo implementation, I'm not really sure what settings I would change in ACL to make this work properly ever again without replacing netbird with vanilla wireguard and doing the ACL myself, because modifying the nftables rules for every machine isn't feasible from a maintainability standpoint.

@Dr-Shadow
Copy link

Oh I understand why it is currently affecting your setups and not mine after a quick test upgrading on 0.30.
I'm not running dockerized netbird client, my netbird instances are running directly on host.

@Spiritreader
Copy link

Spiritreader commented Oct 7, 2024

Oh I understand why it is currently affecting your setups and not mine after a quick test upgrading on 0.30. I'm not running dockerized netbird client, my netbird instances are running directly on host.

Yep exactly!
If you run netbird on the host and don't have any bridge networks, the issue doesn't show up.

Basically the temporary workaround is to override this behavior:
Post-0.30.0: Peer A only accepts return traffic for connections it initiates to the Routed Network through Peer B.
by setting:

nft replace rule ip netbird netbird-rt-fwd handle $H ct state new,related,established accept
nft replace rule ip netbird netbird-acl-output-rules handle $H ct state new,related,established accept
nft replace rule ip netbird netbird-acl-input-rules handle $H ct state new,related,established accept

My plea to the netbird team would be that if the current behavior stays without any changes, an option to set this in some way for groups via ACL is added. Or any other option that enables the possibility to restore the pre 0.29.x permissive behavior.

@hadleyrich
Copy link

I'd have to agree with @Spiritreader i don't quite understand this change without some sort of config option or workaround.

If I have a bunch of servers running docker with the default bridge networking, and NetBird running on those hosts.

Am I expected to set up many duplicate very fine grained routes for dynamic pools of private ips for distribution to very small client groups?

What if I want one NetBird client to access two separate other clients which run docker with overlapping private ranges?

This seems like a huge amount of potentially impossible route config where previously there was no network route config required at all?

@iball
Copy link

iball commented Oct 7, 2024

After downgrading, everything is back to working now whereas before I was seeing the exact same behavior others have noted here - unable to connect to services on servers over the Netbird network.

@enyachoke
Copy link

Could this be related to this PR #2705

@FlintyLemming
Copy link

All service run on the docker bridge network can not be accessed by Netbird IP

@mgarces
Copy link
Contributor

mgarces commented Oct 9, 2024

We are working on a fix for deployments with docker host.
As a workaround for the moment, you can add a route to the docker local subnet and link that network route to the peer. This will add the necessary rules for this to work.

@Spiritreader
Copy link

We are working on a fix for deployments with docker host. As a workaround for the moment, you can add a route to the docker local subnet and link that network route to the peer. This will add the necessary rules for this to work.

That works on the peers that have netbird installed directly on the peer, but not on those that run netbird itself as docker container with network_mode: host

I also have around 40 docker subnets on some of my machines that all change on reboot, so my suggestion would be to downgrade to 0.24 instead of attempting to fix it it via the work around.

@mgarces
Copy link
Contributor

mgarces commented Oct 10, 2024

hi there; can you please update to and try our latest release v0.30.1 ?

@Spiritreader
Copy link

hi there; can you please update to and try our latest release v0.30.1 ?

Good news!
Functionality is restored (access to docker via netbird works again)

  • machine that runs netbird directly on host
  • machine that runs netbird in docker

However, the issue still persists that it breaks iptables when the netbird service is running

❯ sudo iptables -L
iptables v1.8.9 (nf_tables): table `filter' is incompatible, use 'nft' tool.
❯ netbird down
Disconnected
❯ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER-USER  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (3 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.19.0.2           tcp dpt:http
ACCEPT     tcp  --  anywhere             172.18.0.2           tcp dpt:http
ACCEPT     tcp  --  anywhere             172.18.0.2           tcp dpt:https

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-ISOLATION-STAGE-2 (3 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-USER (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere

@IKA3RUS
Copy link
Author

IKA3RUS commented Oct 10, 2024

hi there; can you please update to and try our latest release v0.30.1 ?

Hey @mgarces, the issue's solved now and I am able to connect to my services. Thanks for the quick fix.

For anyone who might want to compare the nft tables, here's the output for nft list table netbird with netbird 30.0.1.

table ip netbird {
        set nb0000001 {
                type ipv4_addr
                flags dynamic
                elements = { 0.0.0.0 }
        }

        set nb0000002 {
                type ipv4_addr
                flags dynamic
                elements = { 0.0.0.0 }
        }

        chain netbird-rt-fwd {
                ct state established,related accept
        }

        chain netbird-rt-postrouting {
                type nat hook postrouting priority srcnat - 1; policy accept;
        }

        chain netbird-acl-input-rules {
                ct state established,related accept
                accept
        }

        chain netbird-acl-output-rules {
                ct state established,related accept
                accept
        }

        chain netbird-acl-input-filter {
                type filter hook input priority filter; policy accept;
                iifname "wt0" jump netbird-acl-input-rules
                iifname "wt0" drop
        }

        chain netbird-acl-output-filter {
                type filter hook output priority filter; policy accept;
                oifname "wt0" ip daddr != 100.64.0.0/16 accept
                oifname "wt0" jump netbird-acl-output-rules
                oifname "wt0" drop
        }

        chain netbird-acl-forward-filter {
                type filter hook forward priority filter; policy accept;
                meta mark 0x0001bd01 jump netbird-acl-input-rules
                iifname "wt0" jump netbird-rt-fwd
                iifname "wt0" drop
        }

        chain netbird-rt-prerouting {
                type filter hook prerouting priority mangle; policy accept;
                iifname "wt0" fib daddr type local meta mark set 0x0001bd01
        }
}

@IKA3RUS IKA3RUS closed this as completed Oct 10, 2024
@nimdasx
Copy link

nimdasx commented Dec 10, 2024

I'm currently using NetBird version 0.34.1, and this issue still persists. However, after downgrading to version 0.29.4, the problem is resolved. I am still unsure about the solution to fix this issue without downgrading to version 0.29.4

@fejiso
Copy link

fejiso commented Dec 10, 2024

Same here, this is not fixed. I'm stuck on 0.29.4 for my services. Running dockerized NetBird.

@nimdasx
Copy link

nimdasx commented Dec 28, 2024

still problem in version 0.35.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests