Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header id mismatch on Windows Subsystem for Linux #79

Closed
sipsorcery opened this issue Jul 20, 2020 · 10 comments · Fixed by #84
Closed

Header id mismatch on Windows Subsystem for Linux #79

sipsorcery opened this issue Jul 20, 2020 · 10 comments · Fixed by #84
Milestone

Comments

@sipsorcery
Copy link
Contributor

I use the ResolveServiceAsync method extensively for SIP and STUN service lookup (big thanks for providing such a clean approach).

The issue I've noticed is that on WSL I get Header id mismatch exceptions when attempting lookups. If I use the same code on Windows I don't get the exception. It's possible/likely that a bigger factor is the different DNS configurations on my Windows machine compared to the WSL VM.

I'm using the DnsClient nuget package v1.3.2. Below is a sample program that generates the exception for me.

using System;
using System.Linq;
using System.Net;
using DnsClient;
using DnsClient.Protocol;

namespace DnsClientConsole
{
    class Program
    {
        static void Main()
        {
            Console.WriteLine("DNS Test");

            LookupClientOptions opts = new LookupClientOptions
            {
                Retries = 1,
                Timeout = TimeSpan.FromSeconds(1),
                UseCache = false,
            };
            
            var lookup = new LookupClient(opts);

            string hostname = "google.com";
            int defaultPort = 5060;
            QueryType queryType = QueryType.AAAA;

            var result = lookup.ResolveServiceAsync(hostname, "stun", "udp")
                    .ContinueWith<IPEndPoint>(x =>
                    {
                        ServiceHostEntry srvResult = null;

                        if (x.IsFaulted)
                        {
                            Console.WriteLine($"Dns SRV lookup failure for {hostname}. {x.Exception?.InnerException?.Message}");
                        }
                        else if(x.Result == null || x.Result.Count() == 0)
                        {
                            Console.WriteLine($"Dns SRV lookup returned no results for {hostname}.");
                        }
                        else
                        { 
                            srvResult = x.Result.OrderBy(y => y.Priority).ThenByDescending(w => w.Weight).FirstOrDefault();
                            Console.WriteLine($"Using SRV result: {srvResult}");
                        }

                        string host = hostname; // If no SRV results then fallback is to lookup the hostname directly.
                        int port = defaultPort; // If no SRV results then fallback is to use the default port.

                        if (srvResult != null)
                        {
                            host = srvResult.HostName;
                            port = srvResult.Port;
                        }

                       return HostQuery(lookup, host, port, queryType);
                    });

            result.Wait();

            Console.WriteLine($"Result: {result.Result}.");
        }

        /// <summary>
        /// Attempts to resolve a hostname.
        /// </summary>
        /// <param name="host">The hostname to resolve.</param>
        /// <param name="port">The service port to use in the end pint result (not used for the lookup).</param>
        /// <param name="queryType">The lookup query type, either A or AAAA.</param>
        /// <returns>If successful an IPEndPoint or null if not.</returns>
        private static IPEndPoint HostQuery(LookupClient lookup, string host, int port, QueryType queryType)
        {
            try
            {
                var addrRecord = lookup.Query(host, queryType).Answers.FirstOrDefault();
                if (addrRecord != null)
                {
                    return GetFromLookupResult(addrRecord, port);
                }
            }
            catch (Exception excp)
            {
                Console.WriteLine($"Dns lookup failure for {host} and query {queryType}. {excp.Message}");
            }

            if(queryType == QueryType.AAAA)
            {
                return HostQuery(lookup, host, port, QueryType.A);
            }

            return null;
        }

        private static IPEndPoint GetFromLookupResult(DnsResourceRecord addrRecord, int port)
        {
            if (addrRecord is AaaaRecord)
            {
                return new IPEndPoint((addrRecord as AaaaRecord).Address, port);
            }
            else if (addrRecord is ARecord)
            {
                return new IPEndPoint((addrRecord as ARecord).Address, port);
            }
            else
            {
                return null;
            }
        }
    }
}

Windows Output:

DNS Test
Dns SRV lookup failure for google.com. Query 61228 => _stun._udp.google.com. IN SRV on 2001:4860:4860::8888:53 timed out or is a transient error.
Result: [2a00:1450:400b:c01::64]:5060.

WSL Output:

DNS Test
Dns SRV lookup returned no results for google.com.
Dns lookup failure for google.com and query AAAA. Header id mismatch.
Result: 74.125.193.100:5060.
@MichaCo
Copy link
Owner

MichaCo commented Jul 22, 2020

Hi @sipsorcery
I cannot reproduce it locally using normal Windows or my Ubuntu 18.04 WSL distribution.

Do you have any DNS proxy or anything else custom? Might be very much related to your DNS server not returning valid results.
Apart from that, you are heavily mixing sync over async code in your example which might be bad but not like that bad I guess ;)

@sipsorcery
Copy link
Contributor Author

Thx for taking a look.

My Windows DNS config is:

DNS Servers . . . . . . . . . . . : 2001:4860:4860::8888
                                       2001:4860:4860::8844
                                       8.8.8.8
                                       192.168.0.1

WSL uses some kind of virtual network interface and sends all DNS requests through a gateway, in my case 172.24.208.1. I don't know exactly what the logic is under the hood but after that the WSL DNS request gets sent then on my main Ethernet adapter, and in short succession, the same request is sent to all the configured DNS servers. So 1x DNS query from WSL generates 4x to the network.

Perhaps this "forking" of the DNS queries causes the duplicate headers?

Here's the packet capture of both my WSL and Ethernet interfaces. The 172 addresses are WSL.

No.	Time	Source	Src Port	Destination	Dst Port	Protocol	Length	Info
42	1.578567	172.24.209.148	37429	172.24.208.1	53	DNS	92	Standard query 0xdf46 SRV _stun._udp.google.com OPT
43	1.580048	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	52848	2001:4860:4860::8888	53	DNS	101	Standard query 0x5328 SRV _stun._udp.google.com
44	1.618816	192.168.11.50	52848	8.8.8.8	53	DNS	81	Standard query 0x5328 SRV _stun._udp.google.com
45	1.643754	8.8.8.8	53	192.168.11.50	52848	DNS	131	Standard query response 0x5328 No such name SRV _stun._udp.google.com SOA ns1.google.com
46	1.643971	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::10	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
47	1.643995	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::11	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
48	1.644017	192.168.11.50	62018	8.8.8.8	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
49	1.644097	192.168.11.50	62018	192.168.0.1	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
50	1.746706	8.8.8.8	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
51	1.746847	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
52	1.749981	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
53	1.749873	192.168.0.1	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
54	1.783436	172.24.209.148	37429	172.24.208.1	53	DNS	81	Standard query 0x0dd7 AAAA google.com OPT
55	1.783803	172.24.208.1	53	172.24.209.148	37429	DNS	164	Standard query response 0x0dd7 AAAA google.com AAAA 2a00:1450:400b:c01::71 AAAA 2a00:1450:400b:c01::8a AAAA 2a00:1450:400b:c01::66
56	1.788930	172.24.209.148	58575	172.24.208.1	53	DNS	81	Standard query 0xa078 A google.com OPT
57	1.789466	172.24.208.1	53	172.24.209.148	58575	DNS	176	Standard query response 0xa078 A google.com A 74.125.193.113 A 74.125.193.102 A 74.125.193.101 A 74.125.193.138 A 74.125.193.139 A 74.125.193.100
58	4.672360	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
59	4.676703	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
60	4.644754	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::10	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
61	4.644787	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::11	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
62	4.644810	192.168.11.50	62018	8.8.8.8	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
63	4.644830	192.168.11.50	62018	192.168.0.1	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
64	4.672277	8.8.8.8	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
65	4.676520	192.168.0.1	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
66	7.675447	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
67	7.656711	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::10	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
68	7.656751	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::11	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
69	7.656789	192.168.11.50	62018	8.8.8.8	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
70	7.656814	192.168.11.50	62018	192.168.0.1	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
71	7.675357	8.8.8.8	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
72	7.679466	192.168.0.1	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
73	7.679518	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
74	10.669993	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::10	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
75	10.689515	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
76	10.694573	172.24.208.1	53	172.24.209.148	37429	DNS	142	Standard query response 0xdf46 No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
77	10.670087	2a02:8084:6981:78f0:2e0:67ff:fe09:9b12	62019	2001:730:3ec2::11	53	DNS	112	Standard query 0x6aff SRV _stun._udp.google.com OPT
78	10.670292	192.168.11.50	62018	8.8.8.8	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
79	10.670422	192.168.11.50	62018	192.168.0.1	53	DNS	92	Standard query 0x6aff SRV _stun._udp.google.com OPT
80	10.689399	8.8.8.8	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT
81	10.694493	192.168.0.1	53	192.168.11.50	62018	DNS	142	Standard query response 0x6aff No such name SRV _stun._udp.google.com SOA ns1.google.com OPT

Apart from that, you are heavily mixing sync over async code in your example which might be bad but not like that bad I guess ;)

Sigh, I've spent so much time attempting to get a grip on Task Asynchronous Programming and still struggle. I realise it's not part of the DNS problem but any chance you could elaborate to help my understanding?

Do you mean that I should use await instead of .Wait so as to asynchronously block instead of synchronously block? If so I get that. Since it's a console app there's nothing else for the main thread to do so they have the same outcome. If it was a GUI app with asynchronization context then I should have used await.

@MichaCo
Copy link
Owner

MichaCo commented Jul 22, 2020

If that really forks the DNS request then yes, that could be the issue.
The ID problem means basically that the client didn't get the correct response, it received a different response from a different request which should never happen, unless something is wrong (usually on the network).

I have no idea why WSL would do that though, I'll might try to research it a bit, seems strange.

Regarding your async sync code, maybe this helps, I rewrote it primarily to fix the secondary call.
Also not that you can make your Main method async now!

using System;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using DnsClient;
using DnsClient.Protocol;

namespace DnsClientConsole
{
    internal class Program
    {
        private static async Task Main()
        {
            Console.WriteLine("DNS Test");

            LookupClientOptions opts = new LookupClientOptions(NameServer.GooglePublicDnsIPv6)
            {
                Retries = 1,
                Timeout = TimeSpan.FromSeconds(1),
                UseCache = false,
            };

            var lookup = new LookupClient(opts);

            string hostname = "google.com";
            int defaultPort = 5060;
            QueryType queryType = QueryType.AAAA;

            try
            {
                var result = await lookup.ResolveServiceAsync(hostname, "stun", "udp");
                if (result.Count() == 0)
                {
                    Console.WriteLine($"Dns SRV lookup returned no results for {hostname}.");
                }

                var srvResult = result.OrderBy(y => y.Priority).ThenByDescending(w => w.Weight).FirstOrDefault();

                IPEndPoint endPoint;
                if (srvResult != null)
                {
                    Console.WriteLine($"Using SRV result: {srvResult}");

                    endPoint = await HostQuery(lookup, srvResult.HostName, srvResult.Port, queryType).ConfigureAwait(false);
                }
                else
                {
                    endPoint = await HostQuery(lookup, hostname, defaultPort, queryType);
                }

                Console.WriteLine($"Result: {endPoint}.");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Dns SRV lookup failure for {hostname}. {ex.InnerException?.Message ?? ex.Message}");
            }
        }

        /// <summary>
        /// Attempts to resolve a hostname.
        /// </summary>
        /// <param name="host">The hostname to resolve.</param>
        /// <param name="port">The service port to use in the end pint result (not used for the lookup).</param>
        /// <param name="queryType">The lookup query type, either A or AAAA.</param>
        /// <returns>If successful an IPEndPoint or null if not.</returns>
        private static async Task<IPEndPoint> HostQuery(LookupClient lookup, string host, int port, QueryType queryType)
        {
            try
            {
                var addrRecord = (await lookup.QueryAsync(host, queryType).ConfigureAwait(false))
                    .Answers
                    .FirstOrDefault();

                if (addrRecord != null)
                {
                    return GetFromLookupResult(addrRecord, port);
                }
            }
            catch (Exception excp)
            {
                Console.WriteLine($"Dns lookup failure for {host} and query {queryType}. {excp.Message}");
            }

            if (queryType == QueryType.AAAA)
            {
                return await HostQuery(lookup, host, port, QueryType.A);
            }

            return null;
        }

        private static IPEndPoint GetFromLookupResult(DnsResourceRecord addrRecord, int port)
        {
            if (addrRecord is AddressRecord addressRecord)
            {
                return new IPEndPoint(addressRecord.Address, port);
            }

            return null;
        }
    }
}

@MichaCo
Copy link
Owner

MichaCo commented Jul 30, 2020

I might change the behavior of the library to log a warning instead of throwing a hard error in that case.
I was never really sure about this check anyways.

I anyone has a strong opinion on this let me know

@sipsorcery
Copy link
Contributor Author

+1 for the warning instead of exception.

@dezfowler
Copy link

This is closed however 1.4.0 isn't released yet.

I'm seeing this same error using the AWS Lambda .NET Core 2.1 runtime which uses Linux.

Have multiple threads using the same LookupClient instance, as recommended.

@MichaCo Any closer to understanding what the issue is here and whether just having a warning here is safe? If I upgrade to 1.4.0 will I be getting the wrong DNS response for my queries?

@MichaCo
Copy link
Owner

MichaCo commented Jan 8, 2021

Well, in the cases reported, there was another layer between my UDP request and the server altering the original DNS request.
You'd not get a wrong response in that scenario.

In general, the assumption is that you'd not get a different/wrong response for the same question anyways. Its just odd that you'd get a cached or different response with a different ID.

Checking the request ID is more a guard check and cannot really be trusted, or used as a security check, anyways. That's why I think making it a warning should be fine.

Yeah 1.4 is still not release, just came back from vacation - happy new year!

@dezfowler
Copy link

Happy New Year to you too!

The other potentially important factor is that I'm using TCP rather than UDP there's this outstanding issue about Socket thread-safety for TCP on Linux... dotnet/runtime#44422

I haven't dug into the DnsClient code enough to know how it's working with sockets under the hood but is it possible this is related?

@MichaCo
Copy link
Owner

MichaCo commented Jan 8, 2021

Could be related. Not sure.
The TCP implementation of DnsClient currently uses a the TcpClient class, calls GetStream() and then writes directly to that stream.
I'd have to dig into what the stream is doing in that case to see if that calls those socket APIs internally!?

@dezfowler
Copy link

It ends up in a Socket.SendAsync here...
https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Tasks.cs#L854

I don't think it's exactly the same (especially as it's async rather than sync) but from some other comments where we people are talking about the SocketAsyncContext and some more low level stuff it sounds like it may be related.

That said I am surprised that we're consistently getting that same error and, if it is randomly interleaving sent and received packets, that it doesn't fail with more variety of error messages. I'll dig into our logs further I think and maybe try the workaround of putting a big lock around it all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants