-
Notifications
You must be signed in to change notification settings - Fork 14
12/WAKU2-FILTER: Handle Client failure #475
Comments
Thanks for the initial issue! I'm not going to comment on specific approach suggested here (though Hanno might have stronger opinions here). What I'll do instead is to give some meta comments on how the problem is formulated, broken down and approached. Hopefully it is useful. 1) Problem description can be clearer
This can be rewritten for clarity to make it more clear what the problem is and then link to additional context. For example:
A reader can thus understand in one sentence what the problem is, and once a PR is issued we can see if it addresses this problem or not. There's also a link for additional context but the issue itself is self-contained. 2) Proposed solution can be more directive and take a specific point of viewAs opposed to just outlining many alternatives and forcing reader to think about the problem in detail. It is better to have a specific proposal, mentioning trade-offs and suggested order of operation. If someone disagrees, or if you missed something, it is easier to fill in the gaps for the reader "did you think of this?" or "I think we should do this before this"
If one solution is simpler and gets us 80% there, can have a section "future enhancements" or so detailing additional measures to take. 3) Implementation issueSince there's already a spec and initial implementation of filter, it would be good to also have a corresponding issue on nim-waku on immediate first step. There are also some issues already in nim-waku (ctrl-f "filter") so linking those in a parent/specific issue would be useful. 4) Taking a step back - contextualizing problem and approachThis started in #469 It might be a good idea to respond there with a comment in the shape of something like this (obviously only for the things we know about / makes sense, just as a rough guideline): "I'm going to look into this, I think it makes sense to stat with |
|
Thanks, Reeshav. I'll comment a bit more on the specific Proposed Solutions. I agree with Oskar here that you can (and should) propose the specific solution among the alternatives that you believe will work best, unless it really is unclear which one is best and you'd need external input to proceed. In general you can assume that between all waku developers you are the one that studied Yes, solutions (1) and (2) does not make much sense for us, as we discussed:
So, in general the proposed solution makes sense to me - that is, to keep track of the first time we failed to contact a peer and, if the failure condition persists after some time, to remove the peer. @richard-ramos may have comments here on how well this method will work in go-waku. Note that this leaves some questions to be addressed in future, including
|
Issue moved here |
Currently a filter node attempts to push messages indefinitely even if the peer client node is offline. A node should be able to drop the peer once it’s determined that a client node is unreachable.
The issue was introduced here #469 .
Acceptance criteria
Possible Solutions
The solution of the problem needs to be with the existing libp2p connection between server and client, rather than new connection for heartbeat.
As the waku message handler handles the message push during init , we check for dial failure at that point, which is already implemented to maintain a map as, <peer, offtimestart>, where offtimestart is the first failure epoch. We remove it from peer list if failure persist ,given it exceeds a certain time limit and clear the peer from the map as well.
Time to remove can be configured by the filter node to provide the flexibility of adaptive server node.
Notes
There are 2 more approaches to solve the problem however they have their tradeoffs.
The existing libp2p connection has a callback for connection failure and can be used as a fallback point to stop the message push service and drop the peer. The solution will clean the memory organically as the connections drop, however, as adaptive nodes are expected to be resource restrictive, if we forbid the push request after first failure , it will cause repetitive filter calls.
Adding a heartbeat mechanism using libp2p ping protocol between the peers, which tries the connection every x minutes, failure for 5 times cause the server to drop the peer. This approach will rely on continuous polling which will effect the network bandwith.
cc: @oskarth @richard-ramos @jm-clius @staheri14
The text was updated successfully, but these errors were encountered: