Skip to content

net: BSD: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache #43398

Open
@philpennock

Description

@philpennock

What version of Go are you using (go version)?

$ go version
go version go1.15.6 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pdp/.cache/go-build"
GOENV="/home/pdp/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOINSECURE=""
GOMODCACHE="/home/pdp/go/pkg/mod"
GONOPROXY="github.com/ConnectEverything"
GONOSUMDB="github.com/ConnectEverything"
GOOS="freebsd"
GOPATH="/home/pdp/go"
GOPRIVATE="github.com/ConnectEverything"
GOPROXY="https://athens.home.pennock.cloud,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/home/pdp/tmp/go-build876596449=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I have recurring problems with DNS resolution taking 5+ seconds "sometimes", and only for Go software. I finally got lucky and got a tcpdump showing it.

dns_resolve.go
package main

import (
	"context"
	"fmt"
	"net"
	"os"
	"path/filepath"
	"strconv"
	"time"

	"golang.org/x/net/idna"
)

const EX_USAGE = 64 // per sysexits.h

var Progname string

func Stderr(spec string, args ...interface{}) {
	call := make([]interface{}, 1, 1+len(args))
	call[0] = Progname
	call = append(call, args...)
	fmt.Fprintf(os.Stderr, "%s: "+spec+"\n", call...)
}

func main() {
	Progname = filepath.Base(os.Args[0])
	retval := 0
	defer func() {
		if retval != 0 {
			os.Exit(retval)
		}
	}()

	type resolveDuration struct {
		name     string
		duration time.Duration
	}

	durations := make([]resolveDuration, 0, len(os.Args)) // need _at most_ len(..)-1

	startAll := time.Now()

	res := &net.Resolver{}
	ctx := context.Background()
	count := 0
	for _, hostname := range os.Args[1:] {
		count += 1
		puny, err := idna.ToASCII(hostname)
		if err != nil {
			Stderr("punycode translation of %q failed: %v", hostname, err)
			retval = 1
			continue
		}
		var labelQuoted string
		if puny != hostname {
			labelQuoted = strconv.Quote(hostname) + " [via " + strconv.Quote(puny) + "]"
		} else {
			labelQuoted = strconv.Quote(hostname)
		}
		startThis := time.Now()
		hostlist, err := res.LookupHost(ctx, puny)
		if err != nil {
			Stderr("failed to resolve %s: %v", labelQuoted, err)
			retval = 1
			continue
		}
		durations = append(durations, resolveDuration{name: puny, duration: time.Now().Sub(startThis)})
		fmt.Printf("%s: %s is %+v\n", Progname, labelQuoted, hostlist)
	}
	if count == 0 {
		Stderr("need hostnames to resolve")
		retval = EX_USAGE
	}
	overallDuration := time.Now().Sub(startAll).Round(time.Millisecond)
	Stderr("[%s] overall", overallDuration)
	for _, rd := range durations {
		Stderr("[%s]   %q", rd.duration.Round(time.Millisecond), rd.name)
	}
}
% dns_resolve nats.lan
dns_resolve: "nats.lan" is [192.168.120.160 fd81:1c9e:b379:0:ff:60ff:fe10:e88d]
dns_resolve: [5.039s] overall
dns_resolve: [5.039s]   "nats.lan"

% cat /etc/resolv.conf
# Generated by resolvconf
search lan
nameserver 192.168.120.23
nameserver 192.168.120.24
nameserver 192.168.120.210

All DNS resolvers are fully functional (Unbound 1.13.0), all responding. The .lan zone is served from the resolvers, has an SOA and NS records, has IPv4 and IPv6.

What did you expect to see?

A fraction of a second for net.Resolver.LookupHost() to return A and AAAA records.

What did you see instead?

As you can see in this TCP dump output, the A response came in immediately, but it was 5 seconds before the LookupHost() call issued the AAAA query. There was no retry, the result from the first query was accepted. Go's DNS resolution just wedged for a duration which happens to be equal to the timeout period, before accepting the response.

tcpdump output
# tcpdump -vvvnpi vnet0.37 -Xs0 port 53
tcpdump: listening on vnet0.37, link-type EN10MB (Ethernet), capture size 262144 bytes
00:39:58.588176 IP (tos 0x0, ttl 64, id 24746, offset 0, flags [none], proto UDP (17), length 54)
    192.168.120.241.25371 > 192.168.120.23.53: [udp sum ok] 61515+ A? nats.lan. (26)
        0x0000:  4500 0036 60aa 0000 4011 a7b3 c0a8 78f1  E..6`...@.....x.
        0x0010:  c0a8 7817 631b 0035 0022 8569 f04b 0100  ..x.c..5.".i.K..
        0x0020:  0001 0000 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 0001 0001                           n.....
00:39:58.588948 IP (tos 0x0, ttl 64, id 56517, offset 0, flags [none], proto UDP (17), length 70)
    192.168.120.23.53 > 192.168.120.241.25371: [udp sum ok] 61515* q: A? nats.lan. 1/0/0 nats.lan. [1h] A 192.168.120.160 (42)
        0x0000:  4500 0046 dcc5 0000 4011 2b88 c0a8 7817  E..F....@.+...x.
        0x0010:  c0a8 78f1 0035 631b 0032 f95b f04b 8580  ..x..5c..2.[.K..
        0x0020:  0001 0001 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 0001 0001 c00c 0001 0001 0000 0e10  n...............
        0x0040:  0004 c0a8 78a0                           ....x.
00:40:03.624669 IP (tos 0x0, ttl 64, id 30198, offset 0, flags [none], proto UDP (17), length 54)
    192.168.120.241.27018 > 192.168.120.24.53: [udp sum ok] 19282+ AAAA? nats.lan. (26)
        0x0000:  4500 0036 75f6 0000 4011 9266 c0a8 78f1  E..6u...@..f..x.
        0x0010:  c0a8 7818 698a 0035 0022 23d8 4b52 0100  ..x.i..5."#.KR..
        0x0020:  0001 0000 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 001c 0001                           n.....
00:40:03.625467 IP (tos 0x0, ttl 64, id 8673, offset 0, flags [none], proto UDP (17), length 82)
    192.168.120.24.53 > 192.168.120.241.27018: [udp sum ok] 19282* q: AAAA? nats.lan. 1/0/0 nats.lan. [1h] AAAA fd81:1c9e:b379:0:ff:60ff:fe10:e88d (54)
        0x0000:  4500 0052 21e1 0000 4011 e65f c0a8 7818  E..R!...@.._..x.
        0x0010:  c0a8 78f1 0035 698a 003e ba9d 4b52 8580  ..x..5i..>..KR..
        0x0020:  0001 0001 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 001c 0001 c00c 001c 0001 0000 0e10  n...............
        0x0040:  0010 fd81 1c9e b379 0000 00ff 60ff fe10  .......y....`...
        0x0050:  e88d                                     ..
^C
4 packets captured
101 packets received by filter
0 packets dropped by kernel

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.help wanted

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions