Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill switch when ANY wg interface fails (advanced mode) #1700

Open
samthesamman opened this issue Sep 17, 2024 · 6 comments
Open

Kill switch when ANY wg interface fails (advanced mode) #1700

samthesamman opened this issue Sep 17, 2024 · 6 comments

Comments

@samthesamman
Copy link

Currently testing with 'main' branch (59aa0da).

I currently have 3 wg profiles set up in the advanced interface:

  • LAN
  • USA
  • FRANCE

I keep LAN connected at all times and have selected "Always-on". The allowedIps wg option works to route my local LAN traffic through this interface.

I use either USA or FRANCE (one at a time). I have USA and FRANCE configured with all IPs not in the allowedIps of my LAN (so all non-LAN traffic routes through either of these).

First issue: SLOW to Connect

  • in advanced mode, the configs are very slow to get into the 'Connected' state (~15 seconds). I think the slowness only appears if 1 profile is already connected and I try to connect another. Any single config in Simple mode connects quickly.

Second issue: No kill switch while wg interface connecting?

  • When I connect to the wg profiles USA or FRANCE and they are not yet in the 'connected' state: if I make an HTTP request the traffic just routes through my normal non-wg interface and my IP leaks. Would it be possible to have a kill switch where all traffic is blocked until all active profiles are in the connected state? (I tried to toggle the Lockdown mode but it didn't help here... once "Always-on" is turned on the Lockdown switch is greyed out so not sure if this is even supposed to have an effect here).
@ignoramous
Copy link
Collaborator

Would it be possible to have a kill switch where all traffic is blocked until all active profiles are in the connected state?

This is what Lockdown does. If that's not what you see happening, it is a bug.

once "Always-on" is turned on the Lockdown switch is greyed out

Always-on WireGuards are also (internally) Lockdown.

advanced mode, the configs are very slow to get into the 'Connected' state (~15 seconds).

We've got a few reports about this. Can you check if the Peer Endpoint contains a domain name (these can also be IP addresses)? If so, we've fixed an issue related to it slowing connects/reconnects.

@samthesamman
Copy link
Author

samthesamman commented Sep 17, 2024

I've managed to narrow this down.

  • The Peer Endpoint does NOT contain a domain name.

  • My LAN wg profile has allowedIp set to 10.2.0.0/16. Turns out, if this is set to any 10.x.x.x/x (eg: 10.0.0.0/8 or 10.0.0.0/24 etc...) then I experience the slow connection (all of my wg profiles connect slowly, not just the the LAN profile). If I change this to anything not 10.x.x.x then all the profiles connect immediate. It connects so fast that I can't tell if the IP leak is still happening or not.

  • NOTE: when testing this, I'm not even connecting to my LAN (DNS is set to 1.1.1.1:53 and I'm not trying to access anything locally, so my LAN isn't the thing causing an issue.)

  • NOTE: this is still testing on main branch 59aa0da. The current live Google Play production app has a lot of other bugs with advanced wg that don't let me test this.

@ignoramous
Copy link
Collaborator

My LAN wg profile has allowedIp set to 10.2.0.0/16. Turns out, if this is set to any 10.x.x.x/x (eg: 10.0.0.0/8 or 10.0.0.0/24 etc...) then I experience the slow connection (all of my wg profiles connect slowly, not just the the LAN profile). If I change this to anything not 10.x.x.x then all the profiles connect immediate.

Interesting. Can you share the Peer config (all fields are public and so it is usually okay to share)? If not, can you confirm if the Peer Endpoint is a public IP (as opposed to IP from 10.x.y.z address space)?

NOTE: this is still testing on main branch 59aa0da. The current live Google Play production app has a lot of other bugs with advanced wg that don't let me test this.

This uses a pretty old network engine (from 3 months and ~300 commits ago).

download 'com.github.celzero:firestack:ee0a5ac71f@aar'

@samthesamman
Copy link
Author

samthesamman commented Sep 18, 2024

Here are the peer configs:

[Peer]
PublicKey = xxxx
AllowedIPs = 10.80.0.0/16
Endpoint = 79.127.254.92:51820

[Peer]
PublicKey = xxxx
AllowedIPs = 0.0.0.0/5, 8.0.0.0/7, 10.0.0.0/10, 10.64.0.0/12, 10.81.0.0/16, 10.82.0.0/15, 10.84.0.0/14, 10.88.0.0/13, 10.96.0.0/11, 10.128.0.0/9, 11.0.0.0/8, 12.0.0.0/6, 16.0.0.0/4, 32.0.0.0/3, 64.0.0.0/3, 96.0.0.0/11, 96.32.0.0/12, 96.48.0.0/16, 96.49.0.0/17, 96.49.128.0/18, 96.49.192.0/20, 96.49.208.0/22, 96.49.212.0/25, 96.49.212.128/26, 96.49.212.192/28, 96.49.212.208/31, 96.49.212.210/32, 96.49.212.212/30, 96.49.212.216/29, 96.49.212.224/27, 96.49.213.0/24, 96.49.214.0/23, 96.49.216.0/21, 96.49.224.0/19, 96.50.0.0/15, 96.52.0.0/14, 96.56.0.0/13, 96.64.0.0/10, 96.128.0.0/9, 97.0.0.0/8, 98.0.0.0/7, 100.0.0.0/6, 104.0.0.0/5, 112.0.0.0/4, 128.0.0.0/1
Endpoint = 146.70.179.98:51820

Even if you make that second one more simple like 0.0.0.0/0, you still see the issues.

I tried updating the network engine to 140e42bd57, but this made it worse (the issue was present even when I was using a different CIDR than 10.x.x.x). However, I don't know if I implemented this updated version correctly since I just fixed anything that was causing build errors due to changes from the earlier version.

@ignoramous
Copy link
Collaborator

Does it also happen with a single Peer?

If you're comfortable doing so, turn ON Very verbose logging (from Configure -> Settings -> Log level) then adb logcat | grep "wg:" to see if it shows any anomalies for this config (2 peers, one with 10.x.y.z in allowed IPs) versus for config it doesn't (without 10.x.y.z)

@samthesamman
Copy link
Author

samthesamman commented Sep 19, 2024

Found the issue and submitted a pull request here: #1707.

Turns out it had nothing to do with the 10.x.x.x subnet. The issue was just intermittent/random based on the randomly returned catch-all config. When fetching an optimal config, we need to only consider configs that can handle the route (eg: permitted by allowedIps). This bug also affected the code for finding a proxy to use for DNS queries.

Thanks for pursuing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants