Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped TCP/HTTP connections #1658

Closed
ViRb3 opened this issue Jul 19, 2022 · 8 comments
Closed

Dropped TCP/HTTP connections #1658

ViRb3 opened this issue Jul 19, 2022 · 8 comments

Comments

@ViRb3
Copy link

ViRb3 commented Jul 19, 2022

System

  • macOS 12.4
  • MacBook Pro 2021 M1 Max
  • AdGuard 2.8.1.1140 (stable)

Problem

I recently switched to an ISP which uses a carrier-grade NAT (CGNAT) due to the global shortage of IPv4 addresses. I immediately noticed that some apps, namely Discord and Telegram, started losing connection very frequently. For example, when you first start Discord, it will work great. If you wait for a few minutes, then try to do any action that requires network IO, such as sending a message or loading somebody else's profile picture, this action will hang for up to 30 seconds before it suddenly works again. The app will then work for a few minutes, but after that, the same repeats.

I analyzed the issue with WireShark, and here are my findings.

  1. Discord uses QUIC if AdGuard is disabled. If I enable AdGuard, the connection is immediately dropped to HTTP2. I suppose AdGuard does not support QUIC yet?
  2. If I block UDP port 443 (disabling QUIC), and AdGuard is disabled, I see keepalive packet sent from the client to the server every few minutes. If I enable AdGuard, no such packets are sent. Does AdGuard forget to send keepalive packets?

It is a fact that CGNAT ISPs employ aggressive TCP timeouts in order to keep enough ports free for all of their customers. This means that if a TCP socket does not receive any traffic for a few minutes, it gets dropped and reused. I had the same issue with SSH, but after I enabled keepalive, everything was good. It appears that the problem here is AdGuard does not send these keepalive packets, which in turn allows my ISP to kill all connections used by Discord, Telegram, and most definitely a lot of other apps which reuse connections.

Happy to share pcap files or more information if needed.

Thanks!

@sfionov
Copy link
Member

sfionov commented Jul 20, 2022

Hello!

Yes, QUIC is blocked since out HTTP/3 filtering is not yet complete. But work is in progress.

We will also check if we can save original socket's keepalive flag.

@ViRb3
Copy link
Author

ViRb3 commented Jul 21, 2022

Thanks a lot. Please ping me/respond here when the keepalive has been implemented, so I can install AdGuard again. Currently the connection drops are so bad that I had to disable it :/

@Aydinv13 Aydinv13 transferred this issue from AdguardTeam/AdguardForMac Jul 26, 2022
@ngorskikh
Copy link
Member

@ViRb3 Hi! It seems like an easy fix is not going to be on the table any time soon since Apple "forgot" (in fact, didn't care) to make the keepalive flag (in fact, all socket options) of the proxied socket available to the transparent proxy via the NEAppProxyTCPFlow.

Still I'd like to look more into this issue. I wonder if the problem is not, or not entirely, with the keepalives not being sent. It seems to me that a NAT should at least send RST when the port binding expires.

In order to debug this, can you please do the following:
0. Use the stable version of AdGuard (betas and nightlies currently have a nasty bug where debug logging doesn't work).

  1. Enable debug logging: Advanced -> Logging Level -> Debug
  2. Go to Preferences -> Network -> General -> Applications and uncheck everything other than Discord (or the app that exhibits the issue), to avoid making the log unreadable with tons of unrelated stuff.
  3. Reproduce the issue, while also running packet capture.
  4. Export (Advanced -> Export Logs and System Info) the logs and share them with us along with the packet capture at support@adguard.com. Note the issue number in the email.

@ViRb3
Copy link
Author

ViRb3 commented Jul 26, 2022

I experimented with the settings and realized that it's actually the "Stealth Mode" that triggers this. If I disable the tab, there are no dropped connections. Here are my settings which trigger the issue, I have yet to find out exactly which one causes it:

https://user-images.githubusercontent.com/2650170/181102905-16887af0-52ca-4601-a22f-6dbad3f221d4.png
https://user-images.githubusercontent.com/2650170/181102916-c5315edd-9db1-473d-9218-33b982f0d98a.png
https://user-images.githubusercontent.com/2650170/181102919-c05cb3ba-9519-47a6-ab23-56a5cf303ac2.png
https://user-images.githubusercontent.com/2650170/181102923-6fbb6081-4e16-4501-9ef8-5e26dce69dab.png

EDIT: Nevermind, seems like it takes a while for the issue to trigger after an initial reboot. Will try to capture logs soon.

@ViRb3
Copy link
Author

ViRb3 commented Jul 26, 2022

I was able to capture the logs and pcap and have sent them over e-mail. Ticket #622859. Note that I anonymized the pcap via tracewrangler, but I kept the destination IP address intact, which I pinned via /etc/hosts like so:

162.159.129.233 cdn.discordapp.com

@ryboe
Copy link

ryboe commented Jul 30, 2022

This is a dup of #487.

@ViRb3
Copy link
Author

ViRb3 commented Jul 30, 2022

This is a dup of #487.

What about the dropped TCP connections in HTTP2?

@ngorskikh
Copy link
Member

ngorskikh commented Aug 2, 2022

@ViRb3 As a temporary workaround you can either uncheck the problematic apps in Preferences -> Network -> General -> Applications... or, alternatively, add the problematic domains to HTTPS exclusions: Preferences -> Network -> Exclusions.... No need to disable AdGuard completely. Sorry it took so long to remember to mention that.
I see the futile retransmissions in your pcap. We'll look into how we can determine if keepalives are enabled on the proxied socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants