eth/downloader, trie: pull head state concurrently with chain #2627
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently fast sync works in two phases:
Downloading the blockchain and tx data is a fairly resilient operation which can be aborted and resumed at will. However the state trie is downloaded at a random block close to the chain head, the failure of which is considered a potential attack, resulting in the disabling of fast sync and restarting with slow.
However, most of the time it's just a connectivity issue, maybe bad peers, maybe local connectivity problem, which can result in a very unpleasant sync restart. Still, the downloader has no way to decide why peers dropped off, only that noone was able to serve the promised data, which could mean a bad actor at play.
This PR works around this issue not by making phase 2 more resilient to failure, but rather by shortening phase two to becoming almost instantaneous. It does this by starting a concurrent head (!!!) state trie download already in phase one - where errors are permitted - so that when the sync algo reached phase 2, almost the entire state is downloaded (head) and only the difference between the ~1500 head blocks needs to be pulled (currently about 23K states), versus the entire state (currently about 1.3M). This makes phase two ~0.17% of it's original length, considerably shorter for connectivity issues to wreck havoc.
Please note, the reason we pull the head state and not the pivot state is because we need the pivot to remain random and the head is deterministic and fixed either way, so there's no harm in downloading it.
This does pose a possibility where an attacker might try to send us garbage data while we're pulling the blockchain (as we can't verify the root hash without all the headers leading up to it), but when we do finish downloading the headers, we just notice it's junk and pull the correct state. Given that the most an attacker can achieve this way is to waste a bit of space for newly (and only newly) joining nodes, which should be cleaned up by state pruning, there's not much to gain. Also this attack can only be done in a targeted way and cannot really be magnified by the network. Thus given that the worst attack vector is annoying someone, this change should be safe to include.
Further on the positive side, by pulling most of the state during phase one already: