Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid bad peers propagation #1134

Open
lock9 opened this issue Sep 30, 2019 · 3 comments
Open

Avoid bad peers propagation #1134

lock9 opened this issue Sep 30, 2019 · 3 comments
Labels
Discussion Initial issue state - proposed but not yet accepted Enhancement Type - Changes that may affect performance, usability or add new features to existing modules. P2P Module - peer-to-peer message exchange and network optimisations, at TCP or UDP level (not HTTP).

Comments

@lock9
Copy link
Contributor

lock9 commented Sep 30, 2019

(Originally from @decentralisedkev on #366)

Summary
Problems:

  • When a node disconnects from another node, nodes do not have a way to check this, subsequently, nodes propagate bad peers to other nodes, wasting the other nodes' resources in trying to connect to bad peers and makes the initial synchronization slow.**
  • Nodes have no way to disconnect peers who are slowing them down in terms of message delivery as stated above. For example, if I ask a peer for blocks/headers and that peer is taking more than 45 seconds, the node should disconnect and search for a faster peer to request the same resources from. The problem with that peer being slow will increase if more peers keep connecting to it and asking for inventory.

Do you have any solution you want to propose?

  • Add the heartbeat messages into the p2p protocol; ping and pong. So a node sends a ping message with a nonce, the other node then replies with a pong message with the same nonce to signify that the connection is still alive.

  • Implement an inflight message tracker for each peer that the node has connected to. If a getheaders message is sent and a headers is not heard within 30 seconds, for example, the node should disconnect from that peer.
    An example implementation of this can be seen here: https://github.com/decentralisedkev/neo-go/blob/v2/pkg/p2p/peer/stall/stall.go
    With these solutions in place, nodes will stop keeping a large number of bad peers or peers that are ran for a short period of time.

Where in the software does this update applies to?

  • P2P (TCP)
@lock9 lock9 added the Discussion Initial issue state - proposed but not yet accepted label Sep 30, 2019
@lock9 lock9 added Enhancement Type - Changes that may affect performance, usability or add new features to existing modules. P2P Module - peer-to-peer message exchange and network optimisations, at TCP or UDP level (not HTTP). labels Sep 30, 2019
@ixje
Copy link
Contributor

ixje commented Oct 1, 2019

Also a node should not decide for another node if a peer is bad based on metrics like; ping, response time, etc. The only acceptable metric for filtering should be "not able to connect at all" (a.k.a dead peer), rationale;

  • Ping is based on geographical location, what might be bad a bad address for node A, might be good for node B because they're closer.
  • Response time has some correlation with ping, but also just because you think 2 seconds is too slow, doesn't mean other nodes think that's too slow for their application.

@lock9 lock9 changed the title Avoid propagating bad peers Avoid bad peers propagation Oct 1, 2019
@lock9
Copy link
Contributor Author

lock9 commented Nov 9, 2019

@ixje What if the node you tried to connect was 'full'? If we know that the peer is full, we should avoid propagating it, at least for some time.
I agree that we should not keep trying to connect to the same 'bad nodes' over and over again but also be careful to not judge the node by "our relation with it".

@ixje
Copy link
Contributor

ixje commented Nov 9, 2019

I just noticed that "bad peers" is no longer a thing in NEO3

json["bad"] = new JArray(); //badpeers has been removed

Ideally, yes we avoid propagating. However, what will we do if the list we have available is small and mostly consists of "full" nodes? What is a proper timeout for full nodes to be propagated? They could arguably not be full 1 second after our connection attempt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Initial issue state - proposed but not yet accepted Enhancement Type - Changes that may affect performance, usability or add new features to existing modules. P2P Module - peer-to-peer message exchange and network optimisations, at TCP or UDP level (not HTTP).
Projects
None yet
Development

No branches or pull requests

2 participants