Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS queries leaks if always-on VPN with killswitch enabled malfunctions #3442

Closed
ryrona2 opened this issue Apr 16, 2024 · 34 comments
Closed
Labels
bug Something isn't working upstream

Comments

@ryrona2
Copy link

ryrona2 commented Apr 16, 2024

I was testing out the VPN functionality while monitoring all network traffic on my Wifi hotspot, and while intentionally trying to sabotage the connection, just to see that the killswitch works as intended.

It turns out it is possible to cause DNS queries to leak outside the VPN when it malfunctions. This breaks the expected security a VPN with killswitch should provide, that your internet activity like what sites you visit remains hidden from your ISP even when VPN malfunctions.

Steps to reproduce:

  1. Install Mullvad VPN app from F-Droid repository.
  2. Login to Mullvad account and connect.
  3. Use any app, like Vanadium web browser. No DNS requests leaks outside VPN.
  4. Go to Settings, App permissions and remove Network permission for Mullvad VPN.
  5. Use any app, like Vanadium web browser. Now DNS requests goes out outside VPN, even if no other network traffic works.

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

@thestinger
Copy link
Member

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

No, this is an assumption you're making. Mullvad is setting up the configuration for how the OS VPN functionality works. You're assuming that it's not doing something wrong.

@ryrona2
Copy link
Author

ryrona2 commented Apr 16, 2024

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

No, this is an assumption you're making. Mullvad is setting up the configuration for how the OS VPN functionality works. You're assuming that it's not doing something wrong.

Okay. Yes I assumed that all network traffic got routed through the app. Is there some VPN app known to be coded right that I can try to reproduce this issue with too? Or can I dump the VPN configuration the app has set up somehow to check if it is correctly set up? If the DNS query is not made by the app, shouldn't the killswitch block it anyway?

@thestinger
Copy link
Member

Try using the official WireGuard app instead.

@ryrona2
Copy link
Author

ryrona2 commented Apr 16, 2024

Try using the official WireGuard app instead.

Unfortunately this didn't tell me anything, because the official Wireguard app detects when Network permission is removed from it, and immediately disables the VPN. No DNS is leaked in that case, but all apps believe there is no internet, so maybe they are not even trying. In the case of the Mullvad VPN app, it still believes there is internet, so doesn't disable its VPN configuration.

I would have expected the killswitch to block the DNS query anyway, however Mullvad VPN app has set up their configuration, since the killswitch should be blocking all traffic not going out over the VPN, with the VPN app having no say on the matter.

@Tryptamine9
Copy link

So you have confirmed that a correctly written VPN app, such as the official WireGuard app, that is known to function correctly DOES kill all network traffic when it looses network connection and the kill switch is activated!

Also you have found that a poorly written VPN app DOES NOT kill all traffic when it looses connectivity, and there is a leak when DNS requests are made. Sounds to me like this should be filed with the development team behind Mulvad VPN, not here...

@ryrona2
Copy link
Author

ryrona2 commented Apr 16, 2024

@Tryptamine9 Sure, the Mullvad VPN app could probably act better here. But isn't the whole idea with a kill switch that apps shouldn't be able to make connections outside the VPN when the VPN is malfunctioning? The kill switch is named "Block connections without VPN". This is not happening here. DNS queries goes out without the VPN. And the kill switch is provided by GrapheneOS, not any VPN app. So even if this specific VPN app certainly could improve, it sounds like the actual security issue is in GrapheneOS, either by implementation or expectation of functionality.

@thestinger
Copy link
Member

GrapheneOS uses the standard implementation of this with no changes to it. None of the features or bugs with VPN support have to do with GrapheneOS at the moment.

@ryrona2
Copy link
Author

ryrona2 commented Apr 16, 2024

GrapheneOS uses the standard implementation of this with no changes to it. None of the features or bugs with VPN support have to do with GrapheneOS at the moment.

Okay. Do you expect me to file a bug ticket upstream, or do the GrapheneOS team handle it? Which bug tracker? I am not familiar with upstream here.

@thestinger
Copy link
Member

It would be best to file an Android security issue and we'll do our own investigation to fix it early ourselves. It still needs to be determined what's happening and if it's an OS or app side issue. For example, it could simply be that the API is hard to use correctly and there is supposed to be some configuration done that's not being done by the apps.

@ryrona2
Copy link
Author

ryrona2 commented Apr 24, 2024

@thestinger Yeah, I will investigate more before escalating to AOSP or app developer. The short investigation I did a few days ago showed that the VPN app detected the Network permission being removed, and tore down the VPN interface and then immediately set it up again. But the kill switch was not torn down, and it looks like it is set up in a proper way. And all the rest was set up in an identical way to before removing the Network permission, so I do not yet understand what caused DNS queries to suddenly go out outside VPN.

My current suspicion is that apps do not actually send DNS queries themselves, but that there is a system wide DNS resolver running as another UID which does all DNS queries, and something happens so apps suddenly are able to query the system wide DNS resolver, and that one was not bridged back to the VPN app properly during the rapid tear down and setup of the VPN interface, and since that one is not blocked by the kill switch, it ends up sending out the queries outside the VPN.

If you know how DNS is handled in GrapheneOS, please tell me, since that could speed up my investigations a little. Also if you know how the removal of Network permissions is implemented. I have at least confirmed the permission removal isn't implemented the same way as the kill switch, but still don't know how. I initially suspected they may clash, but that is probably not what is happening after all.

I will prioritize the multicast leak ticket since that is a confirmed real problem. For this one, if the issue is only happening if Network permission is removed from the VPN app, it is not such a serious issue, since there is no logical reason a user would remove the Network permission for the VPN app. Still a bug, but not so serious. I will try to find another way to reproduce the issue that works on AOSP too, but I guess the easiest way is to first find what the actual issue is.

@thestinger
Copy link
Member

@ryrona2 Most apps use the system DNS resolver which is meant to send requests through the VPN provided DNS implementation. Native DNS are handled differently from other requests due to the caching, etc.

@Rawa
Copy link

Rawa commented May 2, 2024

Hello! I'm a developer from Mullvad VPN whom been looking into this issue over the last week and thought I might shim in and give some more context.

The DNS leak is possible to reproduce in the Wireguard app as well. We have reported the issue to Google, you can find the issue here, including steps to reproduce:
https://issuetracker.google.com/issues/337961996

From our testing we can observe the following (also stated in our issue to Google):

  1. If no DNS server is configured on the VPN, DNS requests may leak.
  2. When a tunnel is torn down, the system will leak as well. So setting up two tunnels in the Wireguard app, both with DNS and then switching between them will cause a DNS leak as well.

From our testing we see that the android DNSResolver and DatagramSocket with DatagramPacket won't go out, but some browsers will leak (e.g Chrome), maybe because it uses the Native DNS as mentioned by @thestinger.

Also below you can find a gist with a HTML file that has javascript embedded. The HTML file can be opened in your browser of choice and will do GET requests to unique URLs, thus resulting in new DNS requests. By running this in Chrome with case 1 & 2 and observing network traffic with e.g tcpdump you can see this leak in action.
https://gist.github.com/Rawa/dcc636e45f95143a8ea65ba3ca366ae8

Thanks for creating this issue request and reporting it also to us.

@no-usernames-left
Copy link

Mullvad has a good, information-dense writeup on this issue here.

@thestinger
Copy link
Member

Their post acknowledges that they fixed a bug in their app which resolved a major part of the issues. The issue while reconnecting looks a whole lot like a race condition and it's not yet clear if the issue is on the OS side or the app side. The other multicast issue looks like an OS bug, but this one needs further research to determine the cause. It's likely that the OS can prevent these leaks by working around how apps behave even if it's an app bug but that doesn't imply that the leak blocking toggle was meant to do that. It's meant to block access when the VPN is down, not if the VPN sets a partial configuration or does the setup in the wrong order.

@thestinger
Copy link
Member

Their previous post about connectivity checks claims something that's working by design without issues is a leak. It was highly misleading and largely inaccurate. Android VPN configuration is per-profile and system wide traffic goes through the Owner user VPN by default. Connectivity checks, NTP and the traffic from the VPN app itself is explicitly opted out from going through the VPN. GrapheneOS doesn't use NTP because it's insecure and we simply have our HTTPS network time updates go through the Owner user VPN since they use TCP rather than UDP which may not work through a VPN. We're also fine with users having to fix their clock if it's incorrect to the point that the VPN certificates aren't verifying. Connectivity checks would not work if they went through the VPN. The whole point is detecting if each underlying network works and triggering captive portals which then triggers a UI for handling them which also doesn't get routed through the VPN so users can deal with a captive portal without fully disabling their VPN.

The new post about these DNS leaks is making a lot of assumptions about it and we don't agree with the conclusions that are being drawn. We believe the issue is a race condition where DNS configuration is updated after VPN configuration and we believe it may be possible for apps to avoid this on their own. We plan to implement some form of synchronization in the OS to prevent this but that's not going to help people outside GrapheneOS. As far as we're concerned, it's a very good thing for the OS to provide this functionality instead of each app being given immense privileged access and trying to implement it on their own with no incompatibilities with other apps using those privileges and full support for all the complex functionality supported by the OS in the way that it's intended to work. How is it realistic to do it any other way? If this is an app bug, it's still probably possible to work around it in the OS and block these kinds of leaks, which is a major advantage of the approach. Portraying it as a bad approach and insecure is very silly. It's not as if these apps have anything close to a spotless reputation of avoiding leaks elsewhere.

Connectivity checks are not leaks and it hurts Mullvad's credibility each time that's claimed. The fact that the connectivity check article is positive about GrapheneOS doesn't change how we feel about that.

@no-usernames-left
Copy link

no-usernames-left commented May 4, 2024

Their previous post about connectivity checks claims something that's working by design without issues is a leak.

Daniel, I am the first one to admit that you have forgotten more about Android than I am ever going to know. However, step 10 of their reproduction instructions is pretty damning; if VPN killswitch is enabled at the OS level then I believe it is a completely fair expectation that absolutely no plaintext DNS queries should ever be "on the wire", and to see them in Wireshark on the other end of the Wi-Fi is absolutely a bug (one that is not Graphene's fault).

In some countries, those leaks could get someone killed.

@thestinger
Copy link
Member

@no-usernames-left It's entirely supported to send DNS queries to the regular network DNS while using a VPN with the kill switch enabled. The feature is very flexible and allows doing a lot of different things with the feature. Bear in mind that the feature is not only used with an actual VPN and is meant to support split tunneling features natively without the VPN having to split things itself although in practice the VPN has to handle that itself if it wants non-DNS traffic to be "leaked" on purpose to either the local network or specific apps passing through. It's a supported configuration to not send DNS through it though. The leak toggle is there to prevent leaks that the application can't avoid itself because it's down or when it's up. Half of what Mullvad is calling a bug doesn't really appear to be a bug. The other half may be a bug, but it's possible that could be avoided by the application too by avoiding the race itself. Perhaps it should be figured out what the cause is and whether the app can fully avoid it BEFORE assuming that it's actually a bug in Android's toggle, just a thought.

@thestinger
Copy link
Member

In some countries, those leaks could get someone killed.

Perhaps you should use the built-in IPSec support if it's serious. We can't make any promises about whether apps leak since we don't control them. If it was our app we'd be looking into whether we could change how it brings up the VPN to over telling the OS that the VPN is up before the DNS configuration is processed.

@ryrona2
Copy link
Author

ryrona2 commented May 4, 2024

It's entirely supported to send DNS queries to the regular network DNS while using a VPN with the kill switch enabled.

I just want to say this is totally unexpected behavior from the user's perspective, whether it is as designed or not.

If you set up a kill switch for the VPN on Linux, you would add an iptables rule that blocks all network traffic not going to the specific IP:port combination belonging to the VPN. This would bring peace in mind that even if something does go wrong with the VPN, the system DNS resolver nor anything else can send out traffic on the network.

Also, the kill switch is really just a poor approximation of how a VPN setup should be designed when privacy matters. Whonix is doing it right with their Workstation and Gateway VMs, and QubesOS supports doing a similar setup for VPNs. That is leak proof in every single way.

With that said, I am happy if this leak is resolved, whether by the app or the OS or both. I acknowledge GrapheneOS may want to keep a small delta to AOSP, so I also think it is good the Mullvad developers reported this issue to AOSP, as I think that is where it should be fixed if it isn't GrapheneOS specific, which it looks like it isn't.

@thestinger
Copy link
Member

The built-in OS VPN support can do a better fail safe kill switch than how VPN apps need to work because the OS doesn't know what the VPN app is meant to do and has to allow all traffic it sends and DNS configurations it chooses to use.

@thestinger
Copy link
Member

Should be blocked in the latest release:

https://grapheneos.org/releases#2024050900

Some major improvements should really be made to this infrastructure but doing it downstream would be quite questionable.

@no-usernames-left
Copy link

Thanks Daniel!

@mateusz-markowicz
Copy link

We did some testing on our side (ProtonVPN) and what's happening is that after establishing the tunnel all API requests that our app does fail with DNS errors. When openining the tunnel we set in VpnService.Builder DNS server to 10.2.0.1 (see https://developer.android.com/reference/android/net/VpnService.Builder#addDnsServer(java.net.InetAddress)). We noticed that when the app doesn't set DNS server at all or we set it to some public DNS like 1.1.1.1 everything works fine. So it seems like your fix somehow interferes with the way we set DNS server for the connection.

@albin-mullvad
Copy link

Hey! I'm working with @Rawa on the Mullvad VPN Android app. We also did some testing and could see similar results as reported by @mateusz-markowicz. By the time we were going to report back here a few days ago we saw that the related fix already had been reverted (GrapheneOS/platform_system_netd@296ccdc).

While the fix was addressing the original issue, we could see that regular API communication was broken while having a connected tunnel. In our test setup we could see that DNS queries would leak outside the tunnel BUT still use the tunnel DNS server. In the following capture, 192.168.1.128 is the device and 10.64.0.1 is the DNS server configured for the tunnel.

09:32:59.797907 IP 192.168.1.128.10157 > 10.64.0.1.53: 42899+ A? ipv4.am.i.mullvad.net. (39)

@thestinger thestinger reopened this May 15, 2024
@thestinger
Copy link
Member

This didn't work out and will be reverted in the next GrapheneOS release.

@thestinger
Copy link
Member

thestinger commented Jul 31, 2024

See https://grapheneos.org/releases#2024073100. It should be resolved now in a way we won't have to revert. Please test and report back. The app does need to do things properly too. Mullvad would be a good test app now.

@thestinger thestinger reopened this Aug 1, 2024
@thestinger
Copy link
Member

Our new approach appears to resolve newly all the compatibility issues. However, the main compatibility issue remains: Proton VPN doesn't work anymore with this active.

@thestinger
Copy link
Member

DNS leaks to non-VPN DNS are prevented by the latest release in stable. We're working on determining if DNS leaks to VPN DNS outside of the VPN are possible beyond incorrectly written VPN apps and preventing it with a stricter approach. We had to back off from our initial strict approach blocking both due to app compatibility issues, but we've now shipped most of this successfully.

@no-usernames-left
Copy link

Thanks for all your hard work on this, Daniel! It's very much appreciated.

@thestinger
Copy link
Member

New issue about the portion of this we fixed in early May but which we had to avoid shipping for now due to Proton VPN app incompatibility:

#3831

@thestinger
Copy link
Member

We're working on figuring out how to resolve the compatibility problem, which might be a separate OS bug.

@thestinger
Copy link
Member

We're probably going to fix the multicast issue first because it's likely easier. We haven't been working on it yet since we seemed DNS leaks higher priority.

@EnfLzSp
Copy link

EnfLzSp commented Aug 15, 2024

About Multicast:
Why not just add local network isolation toggle to block all multicast traffic system-wide or on per-app basis and leave Network Permission function as it is? Multicast shouldn't be an issue on mobile networks as everyone is supposed to be isolated from each other, but WiFi AP's are a different matter. An option to disable WiFi Direct would also be nice.

About DNS Leaks:
DNS queries outside of VPN tunnels are leaks, even if they are made to VPN DNS, but it looks like some VPN apps don't agree with that.

@thestinger
Copy link
Member

Why not just add local network isolation toggle to block all multicast traffic system-wide or on per-app basis and leave Network Permission function as it is? Multicast shouldn't be an issue on mobile networks as everyone is supposed to be isolated from each other, but WiFi AP's are a different matter. An option to disable WiFi Direct would also be nice.

The Network permission should already properly block it. It's the VPN lockdown toggle that's not properly preventing it leaking. It's an issue regardless of the network because it leaks info to the network which shouldn't be able to leak. We've figured out how to block most of it already. It appears IGMP does get routed via the VPN rather than leaking but it should really just be dropped. It doesn't seem like as big of an issue as we expected but we still need to determine how it interacts with user profiles.

DNS queries outside of VPN tunnels are leaks, even if they are made to VPN DNS, but it looks like some VPN apps don't agree with that.

These apps are likely accidentally depending on the OS permitting certain leaks. They definitely didn't intend to depend on it but the OS wasn't doing things properly and fixing it to do things more correctly breaks them. It's possible we could fix other OS issues which would avoid these apps being incompatible with strict DNS leak blocking. We do already properly block leaks to non-VPN DNS so we've made significant progress despite this compatibility problem getting in the way of our strict fix. It's very unfortunate we weren't able to simply ship our strict fix in May as we expected we could. It was surprising that there was this kind of compatibility issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream
Projects
None yet
Development

No branches or pull requests

8 participants