Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/downloader, trie: pull head state concurrently with chain #2627

Merged
merged 1 commit into from
May 31, 2016

Conversation

karalabe
Copy link
Member

Currently fast sync works in two phases:

  • Download the blockchain + transaction receipts
  • Download a pivot state and import blocks after it

Downloading the blockchain and tx data is a fairly resilient operation which can be aborted and resumed at will. However the state trie is downloaded at a random block close to the chain head, the failure of which is considered a potential attack, resulting in the disabling of fast sync and restarting with slow.

However, most of the time it's just a connectivity issue, maybe bad peers, maybe local connectivity problem, which can result in a very unpleasant sync restart. Still, the downloader has no way to decide why peers dropped off, only that noone was able to serve the promised data, which could mean a bad actor at play.


This PR works around this issue not by making phase 2 more resilient to failure, but rather by shortening phase two to becoming almost instantaneous. It does this by starting a concurrent head (!!!) state trie download already in phase one - where errors are permitted - so that when the sync algo reached phase 2, almost the entire state is downloaded (head) and only the difference between the ~1500 head blocks needs to be pulled (currently about 23K states), versus the entire state (currently about 1.3M). This makes phase two ~0.17% of it's original length, considerably shorter for connectivity issues to wreck havoc.

Please note, the reason we pull the head state and not the pivot state is because we need the pivot to remain random and the head is deterministic and fixed either way, so there's no harm in downloading it.

This does pose a possibility where an attacker might try to send us garbage data while we're pulling the blockchain (as we can't verify the root hash without all the headers leading up to it), but when we do finish downloading the headers, we just notice it's junk and pull the correct state. Given that the most an attacker can achieve this way is to waste a bit of space for newly (and only newly) joining nodes, which should be cleaned up by state pruning, there's not much to gain. Also this attack can only be done in a targeted way and cannot really be magnified by the network. Thus given that the worst attack vector is annoying someone, this change should be safe to include.


Further on the positive side, by pulling most of the state during phase one already:

  • We don't need to be aggressive in dropping useless nodes (from a state perspective) as we have enough time to pull the states, there's no significant hurry. This resulting in a most stable network.
  • Newly joining nodes which just began to sync can already help each other out as they can pull states from one another too and not have to rely on fully synced large nodes for this data.
    • Mental note: maybe we should do state sync randomly, not in order. This could allow a more distributed availability of data. Though this should be done in another PR :)

@robotally
Copy link

robotally commented May 27, 2016

Vote Count Reviewers
👍 1 @zsfelfoldi
👎 0

Updated: Mon May 30 20:33:37 UTC 2016

@karalabe
Copy link
Member Author

@obscuren @zsfelfoldi Please review :)

d.syncStatsLock.Lock()
d.syncStatsStateTotal = 0
d.syncStatsStateDone = 0
d.syncStatsLock.Unlock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: syncStatsStateTotal was an unused variable that we can just drop off and syncStatsStateDone is better not to be reset so that pulling states in multiple cycles (i.e. reset after some failure) does not look as if it's restarting (causing people to freak out) but rather explicitly display that it's continuing.

@fjl fjl mentioned this pull request May 27, 2016
7 tasks
@karalabe karalabe added this to the 1.4.6 milestone May 30, 2016
@fjl
Copy link
Contributor

fjl commented May 30, 2016

I have tried this PR two times now and it seems to work. Code looks also OK. 👍

@zsfelfoldi
Copy link
Contributor

LGTM too 👍

@karalabe karalabe merged commit 1d5d217 into ethereum:develop May 31, 2016
@obscuren obscuren removed the review label May 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants