You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When recently debugging slow sync issues I've seen the ChainSyncer's msg_queue grow to more than 5k items and stay that long pretty constantly. I don't know what caused it to get that long (could be a malicious peer DoSing us, some blocking calls in the main thread that prevented it from consuming messages, or maybe just more connected peers than our processing power could handle), but once it gets that long the sync will pretty much stall as the main loop will timeout waiting for block data (that has already arrived but is at the end of the queue), re-request that data and go back to waiting for it. That means we'll end up downloading/processing the same data multiple times, and even just to detect whether it's duplicated or not, some processing is necessary, so the event loop never manages to catch up and process all pending messages.
How can it be fixed
We probably need a smaller upper limit on the msg queue (current one is 10k), as well as dropping peer/msgs when we reach the limit (currently we'll just raise a QueueFull error).
We may also want to look into keeping track of average messages/second we receive from every peer, possibly disconnecting if it's above a certain limit. Or something more elaborate, with the goal of preventing malicious peers from DoSing us
The text was updated successfully, but these errors were encountered:
the sync will pretty much stall as the main loop will timeout waiting for block data (that has already arrived but is at the end of the queue), re-request that data and go back to waiting for it.
Another thing we should probably do is pause/reduce our data requests if our msg_queue is too long.
What is wrong?
When recently debugging slow sync issues I've seen the
ChainSyncer
's msg_queue grow to more than 5k items and stay that long pretty constantly. I don't know what caused it to get that long (could be a malicious peer DoSing us, some blocking calls in the main thread that prevented it from consuming messages, or maybe just more connected peers than our processing power could handle), but once it gets that long the sync will pretty much stall as the main loop will timeout waiting for block data (that has already arrived but is at the end of the queue), re-request that data and go back to waiting for it. That means we'll end up downloading/processing the same data multiple times, and even just to detect whether it's duplicated or not, some processing is necessary, so the event loop never manages to catch up and process all pending messages.How can it be fixed
We probably need a smaller upper limit on the msg queue (current one is 10k), as well as dropping peer/msgs when we reach the limit (currently we'll just raise a QueueFull error).
We may also want to look into keeping track of average messages/second we receive from every peer, possibly disconnecting if it's above a certain limit. Or something more elaborate, with the goal of preventing malicious peers from DoSing us
The text was updated successfully, but these errors were encountered: