Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: add dynamic DNS resolution for nodes #30822

Merged
merged 14 commits into from
Dec 13, 2024

Conversation

0xVasconcelos
Copy link
Contributor

Closes #23210

Context

When deploying Geth in Kubernetes with ReplicaSets, we encountered two DNS-related issues affecting node connectivity. First, during startup, Geth tries to resolve DNS names for static nodes too early in the config unmarshaling phase. If peer nodes aren't ready yet (which is common in Kubernetes rolling deployments), this causes an immediate failure:

INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host

The second issue comes up when pods get rescheduled to different nodes - their IPs change but peers keep using the initially resolved IP, never updating the DNS mapping.
This PR adds proper DNS support to enodes by deferring resolution to connection time and adding TTL-based updates. It also handles DNS failures gracefully instead of failing fatally during startup, making it work better in container environments where IPs are dynamic and peers come and go during rollouts.

@fjl fjl changed the title p2p/enode: add dynamic DNS resolution for nodes p2p: add dynamic DNS resolution for nodes Nov 28, 2024
Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of this, but the PR will need some updates.

One key thing that's violated here, is that enode.Node is supposed to be an immutable type, i.e. the fields of it can't be changed.

Storing the DNS name into the Node object is the correct approach, but we can't use Node to also track the updates. The TTL logic should be moved into p2p.dialTask. When dialing the node fails, we can resolve it, and then create another Node instance with the resolved address.

Note also that we have existing 'resolve' logic that checks the DHT for updates to the node endpoint. If the node is found in the DHT, we don't need to use the DNS at all.

@0xVasconcelos
Copy link
Contributor Author

@MariusVanDerWijden and @fjl thanks for the review and opinions.

I will work on this and follow the recommendations, thank you!

@fjl fjl self-assigned this Dec 3, 2024
@0xVasconcelos
Copy link
Contributor Author

0xVasconcelos commented Dec 3, 2024

@fjl what is your opinion about DNSs that can't be resolved? Would be ideal to create a enode.Node with empty ENR?
I noticed if we go the full route refactor we could create a enode.Node without ENR and just render it on demand via some method like enode.ENR() instead of accessing the attribute, but this would imply in storing the pubKey in the enode.Node object as well and rewrite many parts of the code, in my view this would give a more predictable behavior, instead of adding one more corner case.
What do you think its the more reasonable way to follow?

@0xVasconcelos
Copy link
Contributor Author

0xVasconcelos commented Dec 4, 2024

@fjl So I rewrote it from scratch with your suggestions, appreciate your review!

I made a lot simpler now with no TTL handling, trying to resolve every dial with most of the things implemented in the proper dialScheduler.

0xVasconcelos and others added 7 commits December 4, 2024 15:27
Remove methods Endpoint and NeedResolve, since they are kind of redundant.
Instead of adding a new constructor with a hostname parameter, provide method
WithHostname to attach a name to any node.
@fjl
Copy link
Contributor

fjl commented Dec 12, 2024

Pushed some further updates. I think this is OK now, but want to test it a bit more before merging.

@fjl fjl merged commit 804d45c into ethereum:master Dec 13, 2024
2 checks passed
@fjl fjl added this to the 1.14.13 milestone Dec 13, 2024
@fjl fjl modified the milestones: 1.14.13, 1.15.0 Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StaticNodes / TrustedNodes useless in PoA Setup
3 participants