-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisations For The Network Protocol #366
Comments
I seem to have missed this issue, but I like it!
neo/neo/Network/P2P/ProtocolHandler.cs Lines 103 to 106 in 8310035
@erikzhang it looks like the proposed commands are already in there. What would be needed to get this further implemented? All proposed solutions seem backwards compatible as unknown commands can be just be ignored. |
For example, if I request a header from you, and you do not have it, then I will wait X amount of time before coming to the conclusion that you do not have it. My thought was that this was a waste of time, if a node does not have a piece of data, that is requested then ideally they should tell the requesting party. With this implemented, then we can assume that if we do not hear from a node within a certain amount of time, that they have stopped working also. The client(requestor) which is me, would know what is missing, however I would not know whether the node who I requested it from, has the item. A bit long winded, but I hope that makes sense.
For this, I was suggesting that NodeA keeps a list of all of it's known addresses and using pings, we can check which ones have disconnected from NodeA. If NodeA does not know whether the known nodes in his list are still connected, he would inevitably end up propagating bad nodes, when NodeB sends NodeA a
To my understanding, ping is made to check whether the node is still connected, while the stall manager checks whether the connected node is experiencing a large amount of traffic. An example for me would be that, I am trying to sync with the network and I ask nodeD for blocks 100-200, in this scenario, I think there should be a way to determine whether nodeD is experiencing a large amount of traffic and subsequently, we should ask another node for blocks 100-200. In this way, nodeD does not hold me up because of his backlog. When I wrote those solutions, they were meant as a stepping stone to efficiently filter out the bad/overloaded nodes in the network allowing for faster block download times and less network congestion. Let me know if I got anything wrong or anything is written poorly. |
This part was clear before and can be solved with the
This part I'm not really following. My main remark was that I don't see a reason for 2 messages on a partial hit. It can just send what it has and the client can determine itself what it's still missing. ====
I don't think ping will be sufficient to determine good or bad. Some nodes might disconnect you simply because you've not given them I think there are too many factors in play that you can let another node decide for you if an address is good or bad. ====
I think this section was clear. I just believe that this should be a client implementation not in the protocol such that a client can decide after which delay it wants to disconnect. |
I think we should divide this into 3 issues. @neo-project/core what is your opinion? |
+1 |
@decentralisedkev I will create new issues for this, ok? I believe some of them are important and can be implemented sooner than others. |
@decentralisedkev issues created. I edited your issue to mention it. I'm closing this issue now, thank you for your collaboration! |
Below I list four new commands to add to the network protocol, plus a method to remove peers that stall other peers as a possible solution to optimise the network protocol.
#1132 Problem: When nodeA asks nodeB for a header, if nodeB does not have the header, he will not tell nodeA. nodeA will wait for nodeB until some set of time, which slows down nodeA.
Solution: Once nodeB cannot find the header, he should send back a
notfound
message with the hashes of the inv that he could not find, along with a separate message of the headers that he could find.#1133 Problem: When a node sends an incompatible message during the handshake or a malformed message, the node silently disconnects. This makes it hard to debug why two nodes are not communicating, or why one node has disconnected from another.
Solution: If a node disconnects from another node, due to the message that that node had sent, they should send a
reject
message, stating why the node was disconnected and a byte to signify the reason for the computer.#1134 Problem: When a node disconnects from another node, nodes do not have a way to check this, subsequently nodes propagate bad peers to other nodes, wasting the other nodes' resources in trying to connect to bad peers and makes the initial synchronisation slow.
Solution: I have implemented a seednode which filters out the latest available p2p nodes and sends them over the wire via a
addr
message, but this is more of a band-aid solution. Also, I believe it is doing too much filtering, after monitoring it for a day or so, the list tops out at ~35 nodes, after that the seednode spends most of it's time being stalled filtering out a large number of bad peers.A better solution would be to add the heartbeat messages into the p2p protocol;
ping
andpong
. So a node sends aping
message with a nonce, the other node then replies with apong
message with the same nonce to signify that the connection is still alive.#1134 Problem: Nodes have no way to disconnect peers who are slowing them down in terms of message delivery as stated above. For example, if I ask a peer for blocks/headers and that peer is taking more than 45 seconds, the node should disconnect and search for a faster peer to request the same resources from. The problem with that peer being slow, will increase if more peers keep connecting to it and asking for inventory.
Solution: Implement a inflight message tracker for each peer that the node has connected to. If a
getheaders
message is sent and aheaders
is not heard within 30 seconds for example, the node should disconnect from that peer.An example implementation of this can be seen here: https://github.com/decentralisedkev/neo-go/blob/v2/pkg/p2p/peer/stall/stall.go
With these solutions in place, nodes will stop keeping a large number of bad peers or peers that are ran for a short period of time.
In order to implement these, the protocol version would need to be bumped.
The text was updated successfully, but these errors were encountered: