eth/downloader: fix the stall checks/drops during sync #2855
+8
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a fix for an interesting corner case regression introduced by the EDGE release. Our code before the EDGE release tried to estimate the capacity of a peer and fetch data proportional to it. If the retrieval timed out, it assumed we were a bit over zealous with te request and reduced the peer capacity to zero (i.e. 1 data item per request). If this also timed out afterwards, the peer was deemed useless beyond retention (during sync, which needs performance) and dropped it.
The EDGE release introduced a fancier throughput and latency measurement algo, which among other constantly tries to request just a bit more than the capacity. This way it can correctly separate the latency from the bandwidth. This means however, that it will never request just one data item, rather always a minimum of two. Our stall checker code however only assumed 1 to be stalling. So EDGE effectively disabled the stall drop.
The big problem with this is that if I connect to two very bad peers, which time out always, then instead of dropping them off eventually, we switch between one or the other, trying to request the same 2 data items will infinity. We'll never break out of this loop as long as they are there, blocking up sync.