-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: add dynamic DNS resolution for nodes #30822
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of this, but the PR will need some updates.
One key thing that's violated here, is that enode.Node
is supposed to be an immutable type, i.e. the fields of it can't be changed.
Storing the DNS name into the Node object is the correct approach, but we can't use Node to also track the updates. The TTL logic should be moved into p2p.dialTask
. When dialing the node fails, we can resolve it, and then create another Node instance with the resolved address.
Note also that we have existing 'resolve' logic that checks the DHT for updates to the node endpoint. If the node is found in the DHT, we don't need to use the DNS at all.
@MariusVanDerWijden and @fjl thanks for the review and opinions. I will work on this and follow the recommendations, thank you! |
@fjl |
@fjl So I rewrote it from scratch with your suggestions, appreciate your review! I made a lot simpler now with no TTL handling, trying to resolve every dial with most of the things implemented in the proper |
Remove methods Endpoint and NeedResolve, since they are kind of redundant. Instead of adding a new constructor with a hostname parameter, provide method WithHostname to attach a name to any node.
Pushed some further updates. I think this is OK now, but want to test it a bit more before merging. |
Closes #23210
Context
When deploying Geth in Kubernetes with ReplicaSets, we encountered two DNS-related issues affecting node connectivity. First, during startup, Geth tries to resolve DNS names for static nodes too early in the config unmarshaling phase. If peer nodes aren't ready yet (which is common in Kubernetes rolling deployments), this causes an immediate failure:
The second issue comes up when pods get rescheduled to different nodes - their IPs change but peers keep using the initially resolved IP, never updating the DNS mapping.
This PR adds proper DNS support to enodes by deferring resolution to connection time and adding TTL-based updates. It also handles DNS failures gracefully instead of failing fatally during startup, making it work better in container environments where IPs are dynamic and peers come and go during rollouts.