Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry DNS resolution on failure #1762

Merged
merged 1 commit into from
Feb 17, 2021
Merged

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Feb 17, 2021

Motivation

A transient DNS failure can make the node hang:

Feb 17 10:21:57.647  INFO {zebrad="47bcf630" net="Main"}: zebrad::commands::start: Starting zebrad
Feb 17 10:21:57.647  INFO {zebrad="47bcf630" net="Main"}: zebrad::commands::start: config=ZebradConfig { consensus: Config { checkpoint_sync: false }, metrics: MetricsSection { endpoint_addr: None }, network: Config { listen_addr: 0.0.0.0:8233, network: Mainnet, initial_mainnet_peers: {"dnsseed.z.cash:8233", "mainnet.is.yolo.money:8233", "mainnet.seeder.zfnd.org:8233", "dnsseed.str4d.xyz:8233"}, initial_testnet_peers: {"testnet.seeder.zfnd.org:18233", "testnet.is.yolo.money:18233", "dnsseed.testnet.z.cash:18233"}, peerset_initial_target_size: 50, new_peer_interval: 60s }, state: Config { cache_dir: "/home/dev/.cache/zebra", ephemeral: false, debug_stop_at_height: None }, tracing: TracingSection { use_color: true, filter: None, endpoint_addr: None, flamegraph: None }, sync: SyncSection { max_concurrent_block_requests: 50, lookahead_limit: 2000 } }
Feb 17 10:21:57.648  INFO {zebrad="47bcf630" net="Main"}: zebrad::commands::start: initializing node state
Feb 17 10:21:57.648  INFO {zebrad="47bcf630" net="Main"}: zebra_state::config: the open file limit is at or above the specified limit new_limit=1024 current_limit=1024 hard_rlimit=Some(524288)
Feb 17 10:21:57.655  INFO {zebrad="47bcf630" net="Main"}: zebra_state::service::finalized_state: Opened Zebra state cache at /home/dev/.cache/zebra/state/v4/mainnet
Feb 17 10:21:57.656  INFO {zebrad="47bcf630" net="Main"}: zebrad::commands::start: initializing chain verifier
Feb 17 10:21:57.672  INFO {zebrad="47bcf630" net="Main"}:init{config=Config { checkpoint_sync: false } network=Mainnet}: zebra_consensus::chain: initializing chain verifier tip=Some((Height(0), block::Hash("00040fe8ec8471911baa1db1266ea15dd06b4a8a5c453883c000b031973dce08"))) max_checkpoint_height=Height(419581)                                                                                              
Feb 17 10:21:57.672  INFO {zebrad="47bcf630" net="Main"}: zebrad::commands::start: initializing network
Feb 17 10:21:57.672  INFO {zebrad="47bcf630" net="Main"}: zebra_network::peer_set::initialize: Sending initial request for peers
Feb 17 10:21:57.672  INFO listen{addr=0.0.0.0:8233}: zebra_network::peer_set::initialize: Opened Zcash protocol endpoint at 0.0.0.0:8233
Feb 17 10:22:02.673  INFO zebra_network::config: DNS timeout resolving peer IP address host="dnsseed.z.cash:8233" e=Elapsed(())
Feb 17 10:22:02.673  INFO zebra_network::config: DNS timeout resolving peer IP address host="mainnet.is.yolo.money:8233" e=Elapsed(())
Feb 17 10:22:02.673  INFO zebra_network::config: DNS timeout resolving peer IP address host="mainnet.seeder.zfnd.org:8233" e=Elapsed(())
Feb 17 10:22:02.673  INFO zebra_network::config: DNS timeout resolving peer IP address host="dnsseed.str4d.xyz:8233" e=Elapsed(())
Feb 17 10:22:02.673  WARN zebra_network::config: empty peer list after DNS resolution peers={"dnsseed.z.cash:8233", "mainnet.is.yolo.money:8233", "mainnet.seeder.zfnd.org:8233", "dnsseed.str4d.xyz:8233"} peer_addresses={}
Feb 17 10:22:02.673  INFO add_initial_peers: zebra_network::peer_set::initialize: Connecting to initial peer set initial_peers={}

This is a quick bug fix. (It would have cost more effort to open a ticket that describes how to fix the bug.)

Solution

  • Loop until DNS returns at least 1 peer
  • Log an info-level message if DNS returns an empty peer set
  • Wait for 5 seconds between attempts, in case DNS errors out immediately

Review

This is a routine fix that's out of scope for the current sprint.

Related Issues

Follow up to #1662.

Otherwise, a transient DNS failure makes the node hang.
@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code P-Medium I-hang A Zebra component stops responding to requests labels Feb 17, 2021
@teor2345 teor2345 self-assigned this Feb 17, 2021
Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, i tried it and it is working.

@teor2345 teor2345 merged commit 579bd4a into ZcashFoundation:main Feb 17, 2021
@teor2345 teor2345 mentioned this pull request Feb 23, 2021
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-hang A Zebra component stops responding to requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants