Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS pb with v0.8.2 #158

Closed
palainp opened this issue Nov 8, 2022 · 16 comments · Fixed by mirage/mirage-nat#48
Closed

DNS pb with v0.8.2 #158

palainp opened this issue Nov 8, 2022 · 16 comments · Fixed by mirage/mirage-nat#48

Comments

@palainp
Copy link
Member

palainp commented Nov 8, 2022

First reported by fiftyfourthparallel on https://forum.qubes-os.org/t/mirage-firewall-0-8-2-broken-new-users-should-install-0-8-1/14566
It seems that we now have an issue with DNS. I'll try to reproduce the scenario to understand what is the pb root cause.

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

What changed between 0.8.1 and 0.8.2 is mainly 93b92c0 (evenually f2d3faf). I'll try to wrap my head around these commits (and the connected ones in mirage-nat) again.

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

I have the following in my test VM:

$ qvm-firewall test list
NO  ACTION  HOST            PROTOCOL  PORT(S)  SPECIAL TARGET  ICMP TYPE  EXPIRE  COMMENT
0   accept  www.google.com  tcp       443      -               -          -       -
1   accept  -               -         -        dns             -          -       -
2   accept  -               icmp      -        -               -          -       -
3   drop    -               -         -        -               -          -       -

I can reproduce the crash that with the 0.8.2 binary and with the current robur.coop build. I'll try to figure out what is the issue.

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

It seems to be stuck resolving (with name as www.google.com):

dns_client name >|= function

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

Would you mind to compile and try the debug-dns branch of my fork (hannesm/qubes-mirage-firewall.git)? and provide the output here...

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

You spotted the right thing, it loops with:

[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 41277 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 1739
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 1739 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 58504
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 58504 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 48392
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 48392 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 64865
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 64865 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 54812
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 54812 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 51024
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 51024 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 28598
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 28598 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 50525
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 50525 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 57150
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] nat says UDP port 57150 is not free
[2022-11-08 18:15:03] 2022-11-08 17:15:03 -00:00: INF [my-nat] trying UDP port 17536

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

thanks. would you mind to try mirage-nat from my fork in the fix-is-port-free branch? could be done with qubes-firewall at the main branch (no need for debug) ;)

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

Still there's some issue, even with the above fix... namely the resouce "UDP source ports of the firewall" needs to be shared amongst "NAT" and "DNS". This is done by:

  • NAT has caches of which ports are used
  • My_nat has a set of used UDP ports (+last resort)

Whenever a new port is picked, NAT uses pick_free_port which picks a random one and avoids the udp_dns and last_resourt_port -- the NAT implementation furthermore ensures that the picked one is good indeed.

Whenever a fresh DNS port is used, free_udp_port is used that asks the NAT implementation (after pick_Free_port) whether the port is not in use.

The issue now is that free_udp_port may loop (nearly) infinitely. Somehow the last_resort_port should be used more frequently / explicitly: after N (10?) tries of a not-used-by-dns port whether NAT uses it, the last_resort should be taken. There's no need to ask NAT whether that one is free. When there's confirmation that the mirage-nat change leads to a working dns-resolving firewall again, I'll re-structure the free_udp_port code. :)

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

I'll re-structure the free_udp_port code.

See #159

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

The #159 (and with or without the mirage-nat patch) solve the burn cpu loop, but I now can't get access to www.google.com:443. I'll try to investigate why soon.

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

Hmm, maybe try with google.com and www.google.com.... They seem to be different IPs and may refer to each other.

Or use a simpler setup, and test with mirage.io :D

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

I can see in the log (I simplified the rule to accept any proto/any port for mirage.io):

[2022-11-08 19:25:44] 2022-11-08 18:25:44 -00:00: WRN [rules] Ignoring rule 0   mirage.io   accept, could not resolve
[2022-11-08 19:25:44] 2022-11-08 18:25:44 -00:00: WRN [rules] Ignoring rule 0   mirage.io   accept, could not resolve
[2022-11-08 19:25:46] 2022-11-08 18:25:46 -00:00: WRN [rules] Ignoring rule 0   mirage.io   accept, could not resolve
[2022-11-08 19:25:46] 2022-11-08 18:25:46 -00:00: WRN [firewall] Failed to add NAT rewrite rule: Too many retries (IPv4 packet 10.137.0.10 -> 10.139.1.1: id a3b7, off 16384 proto 6, ttl 64, options 
[2022-11-08 19:25:46]  TCP packet seq=3919011120 acknum=0 ack=false rst=false syn=true fin=false win=64240 options=
[2022-11-08 19:25:46] [MSS=1460; Noop; Noop; SACK_ok; Noop; Window>> 7] with payload )
[2022-11-08 19:25:48] 2022-11-08 18:25:48 -00:00: WRN [rules] Ignoring rule 0   mirage.io   accept, could not resolve
[2022-11-08 19:25:48] 2022-11-08 18:25:48 -00:00: WRN [firewall] Failed to add NAT rewrite rule: Too many retries (IPv4 packet 10.137.0.10 -> 10.139.1.1: id d5ee, off 16384 proto 6, ttl 64, options 
[2022-11-08 19:25:48]  TCP packet seq=3694308136 acknum=0 ack=false rst=false syn=true fin=false win=64240 options=
[2022-11-08 19:25:48] [MSS=1460; Noop; Noop; SACK_ok; Noop; Window>> 7] with payload )
[2022-11-08 19:25:49] 2022-11-08 18:25:49 -00:00: WRN [rules] Ignoring rule 0   mirage.io   accept, could not resolve

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

The could not resolve makes me sad.

@hannesm
Copy link
Member

hannesm commented Nov 8, 2022

Could you run that with debug level output? Then we'd see the error (from rules.ml):

      | Error (`Msg m) ->
        Log.warn (fun f -> f "Ignoring rule %a, could not resolve" Q.pp_rule rule);
        Log.debug (fun f -> f "%s" m);

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

The error message is DNS request timeout (I can see multiple Resolving mirage.io messages then after the timeout that many pairs of Ignoring+timeout error message).

@palainp
Copy link
Member Author

palainp commented Nov 8, 2022

Oops I'm faulty there, I've left some firewall rules on my mirage-fw too, that explain why it couldn't resolve some domains. So far with your path on mirage-nat and with the #159 PR it works as intended! Many Thanks @hannesm!

@hannesm
Copy link
Member

hannesm commented Nov 11, 2022

thanks for the report, fixed in the 0.8.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants