-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong DNS configuration for ethdisco.net domain causes a lot of background errors during DNS queries #21454
Comments
Thanks for debugging this! We'll see how to disable this behavior on the server side. |
Just tried to run the
and:
There's no extra AAAA and A records in response, and the |
Second query ID (i.e. YT2KPIIE637MN76A2UTCOCZPFU.all.mainnet.ethdisco.net) is dynamic, it changes about every 10 minutes, you will get the correct ID from the first query, in your case WDF2TUTUUA2Y3QWK337W54HMWA |
ethdisco.net is hosted on Amazon Route53. We've done some checks on our configuration there and these extra records are not configured on the server in any way. This issue could also be a problem with a misbehaving intermediate resolver. |
A stable command to check for this issue is
This will download all available nodes at the given name in the same way geth does. I can run this command without any issue here, so looks like this doesn't trigger for everyone. @Neurone please check your DNS resolver configuration. Maybe we can narrow this down to a specific public resolver and report the issue there. |
@fjl I'll try as soon as possible. Have you tried to query the DNS also from a client outside EC2? |
Tested on mobile phone via termux and DNS actually resolves TXT correctly without problems. Wierd. I'll be again at home in few days, I'll try again from there. Maybe some cached result from specific ISPs or regions? I experienced the problem connecting from Rome, Italy using Fastweb ISP. |
I tested outside of EC2 (don't have anything on EC2) and it works for me. |
Tested again from home, same result: ➜ go-ethereum git:(master) ./devp2p --verbosity 5 dns sync enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.mainnet.ethdisco.net
TRACE[08-21|12:08:36.985] Updating DNS discovery root tree=all.mainnet.ethdisco.net err=nil
TRACE[08-21|12:08:36.986] DNS discovery lookup name=FDXN3SN67NA5DKA4J2GOK7BVQI.all.mainnet.ethdisco.net err=nil
TRACE[08-21|12:08:36.986] DNS discovery lookup name=JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net err="lookup JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net on 172.23.240.1:53: cannot unmarshal DNS message"
lookup JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net on 172.23.240.1:53: cannot unmarshal DNS message |
I found the answer: it's a client side issue with DNS resolver inside VM (or container). Because my ISP use the authoritative DNS to resolve the address, the clean response is like this: ➜ go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6142
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 6783a699d7b88358b60f8ad15f3fa9874b4676dd03cbba40 (good)
;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN TXT
;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 602097 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"
;; AUTHORITY SECTION:
ethdisco.net. 123598 IN NS ns-224.awsdns-28.com.
ethdisco.net. 123598 IN NS ns-1818.awsdns-35.co.uk.
ethdisco.net. 123598 IN NS ns-1441.awsdns-52.org.
ethdisco.net. 123598 IN NS ns-706.awsdns-24.net.
;; ADDITIONAL SECTION:
ns-224.awsdns-28.com. 97906 IN A 205.251.192.224
ns-706.awsdns-24.net. 72637 IN A 205.251.194.194
ns-1441.awsdns-52.org. 72612 IN A 205.251.197.161
ns-1818.awsdns-35.co.uk. 72603 IN A 205.251.199.26
ns-224.awsdns-28.com. 97906 IN AAAA 2600:9000:5300:e000::1
ns-706.awsdns-24.net. 72637 IN AAAA 2600:9000:5302:c200::1
ns-1441.awsdns-52.org. 72612 IN AAAA 2600:9000:5305:a100::1
ns-1818.awsdns-35.co.uk. 72603 IN AAAA 2600:9000:5307:1a00::1
;; Query time: 7 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Fri Aug 21 13:01:26 CEST 2020
;; MSG SIZE rcvd: 583 Even if the message is larger then 512 bytes, the client stop parsing after When I'm on the same machine but inside a virtual network, the resolver mixes the ➜ go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41396
;; flags: qr rd ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN TXT
;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 0 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"
ns-224.awsdns-28.com. 0 IN A 205.251.192.224
ns-706.awsdns-24.net. 0 IN A 205.251.194.194
ns-1441.awsdns-52.org. 0 IN A 205.251.197.161
ns-1818.awsdns-35.co.uk. 0 IN A 205.251.199.26
ns-224.awsdns-28.com. 0 IN AAAA 2600:9000:5300:e000::1
ns-706.awsdns-24.net. 0 IN AAAA 2600:9000:5302:c200::1
;; Query time: 0 msec
;; SERVER: 172.23.240.1#53(172.23.240.1)
;; WHEN: Fri Aug 21 13:03:34 CEST 2020
;; MSG SIZE rcvd: 526 This time the client continue parsing the ANSWER and get the error "cannot unmarshal DNS message" because the message is too long. Using explicitly a non authoritative DNS server (I don't suggest this though) solves the issue in any case because AUTHORITY section is not present at all: go-ethereum git:(master) ✗ dig txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> txt JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45520
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. IN TXT
;; ANSWER SECTION:
JFUFNF3G5436CEITKZS3INWAQE.all.mainnet.ethdisco.net. 21599 IN TXT "enrtree-branch:ZLPOJ25FUOLQSO2WY55OVXSYCU,VI6DY45ZYDCWWGASUK6IQMYLCI,GUNYPUBCAC7TTTEISGRQCJZE2U,XBYYAXQOC4I5MRA6FIO6YZX5YQ,HCGUSFIAF77SNR5M63OUYPTVLY"
;; Query time: 29 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Aug 21 13:06:32 CEST 2020
;; MSG SIZE rcvd: 242 Suggestion: maybe this discovery feature can be replaced with a more decentralized and less "client configuration dependent"? Maybe client can look at a specific address and tx, or to a smart contract, get hash of ENR tree from there and then download the client list from Swarm/IPFS? Maybe something using EIP-2848? :) |
Which resolver software are you using in your VM? It sounds like there is a bug there, would like to confirm this.
The DNS-based discovery is the fallback option for the DHT, which is always enabled and is decentralized. We actually had the DHT first, the DNS-based discovery was added because not everyone can use the DHT. |
I'm using an Ubuntu 18.04.5 distro under Windows 10 via WSL. The virtual NIC is an Hyper-V Virtual Ethernet Adapter Ethernet adapter vEthernet (WSL):
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
Physical Address. . . . . . . . . : 00-15-5D-E5-A9-47
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::5d92:dde2:14c1:abf4%24(Preferred)
IPv4 Address. . . . . . . . . . . : 172.28.176.1(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.240.0
Default Gateway . . . . . . . . . :
DHCPv6 IAID . . . . . . . . . . . : 402658653
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-24-90-8D-94-C8-60-00-CA-E9-ED
DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
fec0:0:0:ffff::2%1
fec0:0:0:ffff::3%1
NetBIOS over Tcpip. . . . . . . . : Enabled It seems there are many unsolved DNS issues (i.e. microsoft/WSL#4285, microsoft/WSL#3268) for this configuration. I did other tests but the only way I can make geth works without problems is to set a static DNS using this suggestion Fix DNS resolution in WSL2. I don't think this is an easy issue to solve, but maybe this workaround can be helpful for others. |
Thank you. Maybe we should create an issue in the WSL repo then. |
Done :) I hope they can do something about it even if I fear it can be a problem with Windows ICS (netsvcs). I think Hyper-V Virtual Adapter redirects queries to that under the hood, but if this is the case we are in the field of closed source so I cannot easily investigate any further. |
Very very cool. Thanks again for the detailed debugging, it was very helpful to isolate the issue. I'll close this one because it looks like this reproduces also with other domains. |
System information
Geth version:
1.9.19
OS & Version: Ubuntu 18.04
Expected behaviour
DNS server should always send messages compliant to RFC 1035. In particular, DNS should not send messages greater then 512 bytes over UDP.
Actual behaviour
Geth truncates DNS packets while discovering new peers. This causes in the background a loop of DNS queries, all doomed to fail.
This issue was also previously reported (#20713) but it seemed to be a pi-hole related issue so it was closed without a solution.
The problem is actually in the DNS server configuration:
The first query for the root is always fine because message size is 497 bytes (check the last line MSG SIZE rcvd):
The second query to the actual ENR tree instead raises the issue:
How to solve the issue
Remove other records from the DNS response aside by
TXT
, in particular those not used like allA
s andAAAA
s.I didn't check all other networks, but Ropsten for example is already configured in a correct way:
Steps to reproduce the behaviour
Start geth from scratch with verbosity level 4
Backtrace
The text was updated successfully, but these errors were encountered: