Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong DNS configuration for ethdisco.net domain causes a lot of background errors during DNS queries #21454

Closed
Neurone opened this issue Aug 16, 2020 · 15 comments

Comments

@Neurone
Copy link
Contributor

Neurone commented Aug 16, 2020

System information

Geth version: 1.9.19
OS & Version: Ubuntu 18.04

Expected behaviour

DNS server should always send messages compliant to RFC 1035. In particular, DNS should not send messages greater then 512 bytes over UDP.

Actual behaviour

Geth truncates DNS packets while discovering new peers. This causes in the background a loop of DNS queries, all doomed to fail.

DEBUG[08-16|20:19:10.158] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"

This issue was also previously reported (#20713) but it seemed to be a pi-hole related issue so it was closed without a solution.

The problem is actually in the DNS server configuration:

  1. DNS server sends 526 bytes when responding to queries like this YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net.
  2. go DNS client follows RFC 1035 and it truncates the message to 512 bytes and then it goes in error while parsing
  3. go DNS client reports the error to geth

The first query for the root is always fine because message size is 497 bytes (check the last line MSG SIZE rcvd):

$ dig TXT all.mainnet.ethdisco.net                           

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> TXT all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49427
;; flags: qr rd ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;all.mainnet.ethdisco.net.      IN      TXT

;; ANSWER SECTION:
all.mainnet.ethdisco.net. 0     IN      TXT     "enrtree-root:v1 e=YT2KPIIE637MN76A2UTCOCZPFU l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1320 sig=frVtmMBzg6wg4Ddhv6Fx1MJaUlWOG7YwskBUcAL7dFNbwo7gMzTeumaJLpPenKH_CFkDkTWs5H6aKOTGSwZFjAE"
ns-224.awsdns-28.com.   0       IN      A       205.251.192.224
ns-706.awsdns-24.net.   0       IN      A       205.251.194.194
ns-1441.awsdns-52.org.  0       IN      A       205.251.197.161
ns-1818.awsdns-35.co.uk. 0      IN      A       205.251.199.26
ns-224.awsdns-28.com.   0       IN      AAAA    2600:9000:5300:e000::1
ns-706.awsdns-24.net.   0       IN      AAAA    2600:9000:5302:c200::1

;; Query time: 0 msec
;; SERVER: 172.27.112.1#53(172.27.112.1)
;; WHEN: Sun Aug 16 19:21:34 CEST 2020
;; MSG SIZE  rcvd: 497

The second query to the actual ENR tree instead raises the issue:

$ dig TXT YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net 

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> TXT YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37600
;; flags: qr rd ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net. IN        TXT

;; ANSWER SECTION:
YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net. 0 IN TXT "enrtree-branch:EIJJYR2Y7HKP7GTBNSTTFDKISM,ZGIF73FX4IUMCY7SIF6AZ673EA,TE4PG4BD5CKZTCSQBCE45CMS2E,JZFQSASGA7S6GWOCXIEHNNY4NU,VT6KSVN7W3K4UF433DW2W6AVAU"
ns-224.awsdns-28.com.   0       IN      A       205.251.192.224
ns-706.awsdns-24.net.   0       IN      A       205.251.194.194
ns-1441.awsdns-52.org.  0       IN      A       205.251.197.161
ns-1818.awsdns-35.co.uk. 0      IN      A       205.251.199.26
ns-224.awsdns-28.com.   0       IN      AAAA    2600:9000:5300:e000::1
ns-706.awsdns-24.net.   0       IN      AAAA    2600:9000:5302:c200::1

;; Query time: 0 msec
;; SERVER: 172.27.112.1#53(172.27.112.1)
;; WHEN: Sun Aug 16 19:22:15 CEST 2020
;; MSG SIZE  rcvd: 526

How to solve the issue

Remove other records from the DNS response aside by TXT, in particular those not used like all As and AAAAs.

I didn't check all other networks, but Ropsten for example is already configured in a correct way:

$ dig TXT 3EAZQHDTFVH6TTYVLEGGHUT5TM.all.ropsten.ethdisco.net

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> TXT 3EAZQHDTFVH6TTYVLEGGHUT5TM.all.ropsten.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1631
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;3EAZQHDTFVH6TTYVLEGGHUT5TM.all.ropsten.ethdisco.net. IN        TXT

;; ANSWER SECTION:
3EAZQHDTFVH6TTYVLEGGHUT5TM.all.ropsten.ethdisco.net. 0 IN TXT "enrtree-branch:XX4U4ZKXUCOWBHPUOGISEAIDJE,4ZR6DWAKZTWUEMTCN47N3KY7YE,NHRHUEAR5OY4K4O6TCL5LUTCLM,PDPBUM23QXC32GCRPCWBZBA2NU,N7NYW2XJCJZE22MXJMGI4HVJO4,I6VIAEOHT4RJC54AA2HFP6QJSI,M7YILXPTHZ7PUJZERM5T72PABA,TC22I752VVE6Y4DYYWG2VOV6PY,OGDOAVH7BVKLSNWTCM42WP" "4IEA,CYV3DOHCVAL43SK5RYQCDIK6EI"

;; Query time: 23 msec
;; SERVER: 172.27.112.1#53(172.27.112.1)
;; WHEN: Sun Aug 16 19:21:11 CEST 2020
;; MSG SIZE  rcvd: 418

Steps to reproduce the behaviour

Start geth from scratch with verbosity level 4

Backtrace

$ geth --verbosity 4
INFO [08-16|20:19:09.860] Starting Geth on Ethereum mainnet... 
INFO [08-16|20:19:09.860] Bumping default cache on mainnet         provided=1024 updated=4096
DEBUG[08-16|20:19:09.860] Sanitizing Go's GC trigger               percent=25
INFO [08-16|20:19:09.862] Maximum peer count                       ETH=50 LES=0 total=50
INFO [08-16|20:19:09.862] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
DEBUG[08-16|20:19:09.862] FS scan times                            list="28.5µs" set=700ns diff=800ns
INFO [08-16|20:19:09.862] Set global gas cap                       cap=25000000
INFO [08-16|20:19:09.862] Allocated trie memory caches             clean=1023.00MiB dirty=1024.00MiB
[...]
DEBUG[08-16|20:19:10.158] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.159] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.160] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.161] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.162] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.163] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.165] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:10.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:11.157] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:11.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:11.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:12.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:12.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:13.156] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:13.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:13.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:14.157] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:14.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:15.156] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:15.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:15.823] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:16.156] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:16.490] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
DEBUG[08-16|20:19:17.156] Error in DNS random node sync            tree=all.mainnet.ethdisco.net err="lookup YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net on 172.27.112.1:53: cannot unmarshal DNS message"
[...]
@fjl
Copy link
Contributor

fjl commented Aug 17, 2020

Thanks for debugging this! We'll see how to disable this behavior on the server side.

@Aldekein
Copy link
Member

Aldekein commented Aug 19, 2020

Just tried to run the dig from an AWS EC2 machine, and getting the following results:

$ dig TXT YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> TXT YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 17019
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net. IN	TXT

;; Query time: 11 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Wed Aug 19 08:21:20 UTC 2020
;; MSG SIZE  rcvd: 80

and:

$ dig TXT all.mainnet.ethdisco.net

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> TXT all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4896
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;all.mainnet.ethdisco.net.	IN	TXT

;; ANSWER SECTION:
all.mainnet.ethdisco.net. 1347	IN	TXT	"enrtree-root:v1 e=WDF2TUTUUA2Y3QWK337W54HMWA l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1329 sig=2bMfIuiDld1MoyqZKl63YD0msEixMlEs02mwE46z0d8ojeUbfUTKLGRpqsfDwhelcq2yqqx51tkfbH44xc9M9gE"

;; Query time: 1 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Wed Aug 19 08:22:17 UTC 2020
;; MSG SIZE  rcvd: 240

There's no extra AAAA and A records in response, and the YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net domain isn't resolved at all.

@Neurone
Copy link
Contributor Author

Neurone commented Aug 19, 2020

Second query ID (i.e. YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net) is dynamic, it changes about every 10 minutes, you will get the correct ID from the first query, in your case WDF2TUTUUA2Y3QWK337W54HMWA

@fjl
Copy link
Contributor

fjl commented Aug 19, 2020

ethdisco.net is hosted on Amazon Route53. We've done some checks on our configuration there and these extra records are not configured on the server in any way. This issue could also be a problem with a misbehaving intermediate resolver.

@fjl
Copy link
Contributor

fjl commented Aug 19, 2020

A stable command to check for this issue is

go build ./cmd/devp2p
./devp2p --verbosity 5 dns sync enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net

This will download all available nodes at the given name in the same way geth does. I can run this command without any issue here, so looks like this doesn't trigger for everyone.

@Neurone please check your DNS resolver configuration. Maybe we can narrow this down to a specific public resolver and report the issue there.

@Neurone
Copy link
Contributor Author

Neurone commented Aug 19, 2020

@fjl I'll try as soon as possible. Have you tried to query the DNS also from a client outside EC2?

@Neurone
Copy link
Contributor Author

Neurone commented Aug 19, 2020

Tested on mobile phone via termux and DNS actually resolves TXT correctly without problems. Wierd. I'll be again at home in few days, I'll try again from there. Maybe some cached result from specific ISPs or regions? I experienced the problem connecting from Rome, Italy using Fastweb ISP.

@fjl
Copy link
Contributor

fjl commented Aug 20, 2020

I tested outside of EC2 (don't have anything on EC2) and it works for me.

@Neurone
Copy link
Contributor Author

Neurone commented Aug 21, 2020

Tested again from home, same result:

➜  go-ethereum git:(master)  ./devp2p --verbosity 5 dns sync enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net
TRACE[08-21|12:08:36.985] Updating DNS discovery root              tree=all.mainnet.ethdisco.net err=nil
TRACE[08-21|12:08:36.986] DNS discovery lookup                     name=FDXN3SN67NA5DKA4J2GOK7BVQI.all.mainnet.ethdisco.net err=nil
TRACE[08-21|12:08:36.986] DNS discovery lookup                     name=JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net err="lookup JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net on 172.23.240.1:53: cannot unmarshal DNS message"
lookup JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net on 172.23.240.1:53: cannot unmarshal DNS message

@Neurone
Copy link
Contributor Author

Neurone commented Aug 21, 2020

I found the answer: it's a client side issue with DNS resolver inside VM (or container).

Because my ISP use the authoritative DNS to resolve the address, the clean response is like this:

➜  go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net                                                             

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6142
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 6783a699d7b88358b60f8ad15f3fa9874b4676dd03cbba40 (good)
;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN        TXT

;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 602097 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"

;; AUTHORITY SECTION:
ethdisco.net.           123598  IN      NS      ns-224.awsdns-28.com.
ethdisco.net.           123598  IN      NS      ns-1818.awsdns-35.co.uk.
ethdisco.net.           123598  IN      NS      ns-1441.awsdns-52.org.
ethdisco.net.           123598  IN      NS      ns-706.awsdns-24.net.

;; ADDITIONAL SECTION:
ns-224.awsdns-28.com.   97906   IN      A       205.251.192.224
ns-706.awsdns-24.net.   72637   IN      A       205.251.194.194
ns-1441.awsdns-52.org.  72612   IN      A       205.251.197.161
ns-1818.awsdns-35.co.uk. 72603  IN      A       205.251.199.26
ns-224.awsdns-28.com.   97906   IN      AAAA    2600:9000:5300:e000::1
ns-706.awsdns-24.net.   72637   IN      AAAA    2600:9000:5302:c200::1
ns-1441.awsdns-52.org.  72612   IN      AAAA    2600:9000:5305:a100::1
ns-1818.awsdns-35.co.uk. 72603  IN      AAAA    2600:9000:5307:1a00::1

;; Query time: 7 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Fri Aug 21 13:01:26 CEST 2020
;; MSG SIZE  rcvd: 583

Even if the message is larger then 512 bytes, the client stop parsing after ANSWER section is finished and all works good.

When I'm on the same machine but inside a virtual network, the resolver mixes the ANSWER section with the AUTORITY section:

➜  go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41396
;; flags: qr rd ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN        TXT

;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 0 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"
ns-224.awsdns-28.com.   0       IN      A       205.251.192.224
ns-706.awsdns-24.net.   0       IN      A       205.251.194.194
ns-1441.awsdns-52.org.  0       IN      A       205.251.197.161
ns-1818.awsdns-35.co.uk. 0      IN      A       205.251.199.26
ns-224.awsdns-28.com.   0       IN      AAAA    2600:9000:5300:e000::1
ns-706.awsdns-24.net.   0       IN      AAAA    2600:9000:5302:c200::1

;; Query time: 0 msec
;; SERVER: 172.23.240.1#53(172.23.240.1)
;; WHEN: Fri Aug 21 13:03:34 CEST 2020
;; MSG SIZE  rcvd: 526

This time the client continue parsing the ANSWER and get the error "cannot unmarshal DNS message" because the message is too long.

Using explicitly a non authoritative DNS server (I don't suggest this though) solves the issue in any case because AUTHORITY section is not present at all:

  go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45520
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN        TXT

;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 21599 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"

;; Query time: 29 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Aug 21 13:06:32 CEST 2020
;; MSG SIZE  rcvd: 242

Suggestion: maybe this discovery feature can be replaced with a more decentralized and less "client configuration dependent"? Maybe client can look at a specific address and tx, or to a smart contract, get hash of ENR tree from there and then download the client list from Swarm/IPFS?

Maybe something using EIP-2848? :)

@fjl
Copy link
Contributor

fjl commented Aug 23, 2020

Which resolver software are you using in your VM? It sounds like there is a bug there, would like to confirm this.

Suggestion: maybe this discovery feature can be replaced with a more decentralized and less "client configuration dependent"? Maybe client can look at a specific address and tx, or to a smart contract, get hash of ENR tree from there and then download the client list from Swarm/IPFS?

The DNS-based discovery is the fallback option for the DHT, which is always enabled and is decentralized. We actually had the DHT first, the DNS-based discovery was added because not everyone can use the DHT.

@Neurone
Copy link
Contributor Author

Neurone commented Aug 24, 2020

I'm using an Ubuntu 18.04.5 distro under Windows 10 via WSL. The virtual NIC is an Hyper-V Virtual Ethernet Adapter

Ethernet adapter vEthernet (WSL):

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
   Physical Address. . . . . . . . . : 00-15-5D-E5-A9-47
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::5d92:dde2:14c1:abf4%24(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.28.176.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 402658653
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-24-90-8D-94-C8-60-00-CA-E9-ED
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

It seems there are many unsolved DNS issues (i.e. microsoft/WSL#4285, microsoft/WSL#3268) for this configuration. I did other tests but the only way I can make geth works without problems is to set a static DNS using this suggestion Fix DNS resolution in WSL2.

I don't think this is an easy issue to solve, but maybe this workaround can be helpful for others.

@fjl
Copy link
Contributor

fjl commented Aug 24, 2020

Thank you. Maybe we should create an issue in the WSL repo then.

@Neurone
Copy link
Contributor Author

Neurone commented Aug 25, 2020

Thank you. Maybe we should create an issue in the WSL repo then.

Done :) I hope they can do something about it even if I fear it can be a problem with Windows ICS (netsvcs). I think Hyper-V Virtual Adapter redirects queries to that under the hood, but if this is the case we are in the field of closed source so I cannot easily investigate any further.

@fjl
Copy link
Contributor

fjl commented Aug 25, 2020

Very very cool. Thanks again for the detailed debugging, it was very helpful to isolate the issue. I'll close this one because it looks like this reproduces also with other domains.

@fjl fjl closed this as completed Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants