Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS TTL Not Respected by celestia-node Leading to Sync Issues #3570

Open
smuu opened this issue Jul 17, 2024 · 4 comments
Open

DNS TTL Not Respected by celestia-node Leading to Sync Issues #3570

smuu opened this issue Jul 17, 2024 · 4 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request external Issues created by non node team members v0.17.0 Intended for v0.17.0 release

Comments

@smuu
Copy link
Member

smuu commented Jul 17, 2024

Description:

We encountered an issue where changes to the DNS entries of DA nodes in Arabica caused light nodes to fail to sync. Restarting the light nodes resolved the issue, indicating that they resolve DNS once at startup and then use the same IP address indefinitely, ignoring DNS TTL.

Steps to Reproduce:

  1. Change the DNS entries for nodes.
  2. Observe light nodes failing to sync.
  3. Restart the light nodes and observe they can sync.

Suspected Cause:
Light nodes resolve DNS entries only once at startup and continue using the same IP address without respecting the TTL. This affects both:

  • DA nodes connecting to other DA nodes.
  • DA Bridge nodes connecting to consensus nodes.

Relevant Code:

Potential Fix:

  1. Periodically re-resolve DNS entries based on the TTL.
  2. Update active connections if the resolved IP address changes.

Repositories Potentially Needing Changes:

Impact:
Not respecting DNS TTL can lead to connectivity and sync issues, affecting network reliability.

Request for Assistance:

  1. Identify where DNS resolution is handled in the codebase and dependencies.
  2. Implement periodic DNS resolution based on TTL.
  3. Test changes to ensure nodes dynamically update connections based on DNS updates.
@smuu smuu added the enhancement New feature or request label Jul 17, 2024
@github-actions github-actions bot added the external Issues created by non node team members label Jul 17, 2024
@renaynay renaynay added bug Something isn't working v0.15.0 labels Jul 17, 2024
@ramin
Copy link
Contributor

ramin commented Jul 17, 2024

i think simplest here is to remove https://github.com/celestiaorg/celestia-node/blob/main/libs/utils/address.go#L40 which resolves the IP a single time on start, instead letting clients use the domain as passed in, and relying on the infra of the internet to work

unless there was there good a reason we HAD to resolve IP?

@smuu
Copy link
Member Author

smuu commented Jul 17, 2024

i think simplest here is to remove https://github.com/celestiaorg/celestia-node/blob/main/libs/utils/address.go#L40 which resolves the IP a single time on start, instead letting clients use the domain as passed in, and relying on the infra of the internet to work

unless there was there good a reason we HAD to resolve IP?

If I understand the code correctly, this would only resolve one part of the issue: the connection between the DA BN and the consensus node.
From my understanding, this code is not called when resolving the DNS in a multiaddr.

@smuu
Copy link
Member Author

smuu commented Jul 22, 2024

One workaround for this issue would be to recreate the connection once it fails after the IP address changes. This way, we don't need to add support to handle the DNS TTL, and the node would request the new IP address from the DNS server.

@renaynay renaynay assigned renaynay and ramin and unassigned renaynay Jul 26, 2024
@renaynay
Copy link
Member

What's status on this @ramin @smuu ?

@renaynay renaynay added v0.17.0 Intended for v0.17.0 release and removed v0.15.0 labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request external Issues created by non node team members v0.17.0 Intended for v0.17.0 release
Projects
None yet
Development

No branches or pull requests

3 participants