Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StaticNodes / TrustedNodes useless in PoA Setup #23210

Closed
hickscorp opened this issue Jul 14, 2021 · 16 comments · Fixed by #30822
Closed

StaticNodes / TrustedNodes useless in PoA Setup #23210

hickscorp opened this issue Jul 14, 2021 · 16 comments · Fixed by #30822

Comments

@hickscorp
Copy link

Background

We are running a set of sealers and nodes in a docker swarm. Upon restarting the cluster, discovery between nodes doesn't work - so we have to add each node and sealer to each other node and sealer.

Initially (and when we were running Geth 1.8) we had an extra container, in charge of connecting to the HTTP RPC, querying each node, and registering it with all others using their HTTP RPC as well. The reason for this is that docker swarm doesn't guarantee that an IP address will be the same for the same container - so our extra container had a script to "scrape" IP addresses and building enode:// URLS to distribute across the nodes.

Problem Statement

More recently, we updated to Geth 1.10 and were pleasantly surprised when we discovered that enode:// specifications can now accept DNS names (so we wouldn't need this scraper container). We tried and having an enode://...@dns_name@nodiscover=1 works great.
So we decided that instead of having a "scraping service" in an extra container, we would instead connect our cluster using static-nodes.json (which didn't work, apparently deprecated) and then using a geth.toml file.
At this point, our TOML file was pretty much looking like this:

[Node.P2P]
NoDiscovery = true
StaticNodes = [enode1, enode2, etc.]
TrustedNodes = []

Unfortunately, this attempt failed short because Geth refuses to boot if any of the StaticNodes / TrustedNodes is unreachable - so there's a bit of a catch-22 situation here when restarting the whole cluster.

Note that we tried using either / both StaticNodes and TrustedNodes.

Suggestion 1

It would be quite nice to have these StaticNodes and / or TrustedNodes act as a warning rather than a FATAL error - this way the node would boot, fail to contact the nodes, and retry later on.

Current State of Things

We didn't stop just here. We made a script such as this one:

var peers = [ ...all of our enodes... ];

// This function connects to all peers listed in the `peers` variable.
function connectPeers() {
  return peers.map(function (peer) {
    return {
      peer: peer,
      addPeer: admin.addPeer(peer),
      addTrustedPeer: admin.addTrustedPeer(peer)
    };
  });
};

// --- Execution. ---

connectPeers();

We were hopeful that there would somehow be an option to tell Geth to run this script. We discovered that we cannot really do that, unless we use geth console or geth attach - which doesn't work in our case since we still would need to run this manually after the cluster has started.

Suggestion 2

Maybe allow for a script to be executed when geth started without console / attach, so that admin.addPeer / admin.addTrustedPeer could be used.

@karalabe
Copy link
Member

This seems odd to me. Could you post the error message you are getting? Static/trusted nodes are merely suggestions. They should not result in any errors if they are offline. My best guess is that parsing the enode IDs fail, which should be apparent from the error message.

@hickscorp
Copy link
Author

hickscorp commented Jul 15, 2021

@karalabe thanks a lot for your help.

I know for a fact that the parsing goes well. The message is:

Fatal: /root/geth.toml, line 2: (p2p.Config.StaticNodes) lookup node1-new on 127.0.0.11:53: no such host
After which geth exits.

EDIT: Just FYI I have found a workaround - sleeping for 5 seconds before booting geth. This way, it gives 5s for all the containers to spin up. Geth crashes a few times, the containers are restarted, and when a happy coincidence of all the nodes being up happens, things connect.
But I really think that the "trying to connect to the nodes specified in the TOML file" should not be a Fatal, but a Warning and let Geth boot as usual and retry later...

@hickscorp
Copy link
Author

Bump - any chance that connecting to peers specified in StaticNodes / TrustedNodes could gracefully be a warning and retried rather than a boot error?

@hickscorp
Copy link
Author

Bump?

@hickscorp
Copy link
Author

Anyone please? Is it that I didn't phrase the issue correctly, or that there isn't any interest in addressing it?

@haidarabdillah
Copy link

any update on this issue? im really interesting to use domain than ip its more flexible when the ip can't be acces

@0xVasconcelos
Copy link
Contributor

+1

@MariusVanDerWijden
Copy link
Member

MariusVanDerWijden commented Nov 26, 2024

I think this should be fixed, I don't see a reason to refuse to start if TrustedNodes/StaticNodes are offline.
@0xVasconcelos do you happen to have a log of how this fails?

@MariusVanDerWijden
Copy link
Member

I tried to repro with the following config, but I couldn't

[Node.P2P]
NoDiscovery = true
StaticNodes = ["enode://319a0a7ecce6d7808bdf703f3a5d7c50aa1431004dd148bb70d69acca53b9224db7e2f174c52cd37276bab0649cef30d5fe20c7fe3ad106aa636a18b8f99359d@myname.com:0"]
TrustedNodes = ["enode://319a0a7ecce6d7808bdf703f3a5d7c50aa1431004dd148bb70d69acca53b9224db7e2f174c52cd37276bab0649cef30d5fe20c7fe3ad106aa636a18b8f99359d@178.238.233.123:30303"]

It never fails to start up, even with NoDiscover set to false, it will try to staticdial and everything.

I will close this issue for now. If this still happens to you, please reopen it or open another issue with some more logs and a config so that we can reproduce it. Thanks for submitting!

@0xVasconcelos
Copy link
Contributor

@MariusVanDerWijden using the v1.14.12 tag.

INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.vasconcelos.sh: no such host

@0xVasconcelos
Copy link
Contributor

0xVasconcelos commented Nov 26, 2024

Even fixing the config parsing, the whole enode.Node is constructed based in IP addresses and lacks a runtime DNS resolving mechanism. Or am I missing something here?
In kubernetes deployments this would translate in nodes that will not be able to connect to nodes that have the IP changed

@0xVasconcelos
Copy link
Contributor

@MariusVanDerWijden thats why you can't reproduce

➜  go-ethereum git:(v1.14.12) ✗ dig +short myname.com
64.190.63.222

@MariusVanDerWijden
Copy link
Member

MariusVanDerWijden commented Nov 26, 2024

Whaat the dns name I choose at random really had a domain behind it?

Ah okay I managed to repro it now

@0xVasconcelos
Copy link
Contributor

0xVasconcelos commented Nov 26, 2024

@MariusVanDerWijden I could submit a PR for the dynamic DNS resolving for static/trusted nodes, but I am not sure the ideal implementation would be refactoring all the needed parts or implement a new field(DynamicStaticNodes?) or implement a flag like EnableDynamicResolver then implement a small worker that keeps resolving the DNS's in the list and updating the peer list in the dial service

@MariusVanDerWijden
Copy link
Member

We kinda discussed it yesterday with the team and came to the conclusion that this issue is a low priority for us and does not warrant a bigger refactor. If you feel strongly that this should be fixed, I would suggest to go down the refactoring route over the worker route.

@0xVasconcelos
Copy link
Contributor

@MariusVanDerWijden opened #30822

@fjl fjl closed this as completed in 804d45c Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@karalabe @hickscorp @0xVasconcelos @MariusVanDerWijden @haidarabdillah and others